Kevin Karhan :verified: · @kkarhan
1437 followers · 102490 posts · Server mstdn.social

@christiansilvermooon prevents so it's impossible to search through shit, literally bricking workflows for people.

& !

#wontuse #cantuse #crawlers #Discord

Last updated 1 year ago

Kevin Karhan :verified: · @kkarhan
1399 followers · 97755 posts · Server mstdn.social

@dogriley @gallaugher That being said your mileage would vary greatly.

For example, can't be banned in Germany if they act with "legitimate interest" [i.e. price comparison systems]...

There was a court case of an airline trying to ban crawlers from accessing their site, and said airline lost against the comparison site.
internetworld.de/digitaler-han

OFC!

#notlegaladvice #crawlers

Last updated 1 year ago

Guilherme aparentemente famoso · @gmgall
206 followers · 1712 posts · Server ursal.zone

Acho que o ponto mais preocupante do artigo é o que vem logo antes da conclusão: LLMs e crawlers podem simplesmente cagar para o que eu disser do meu conteúdo.

Depois que conteúdo meu entrar no mix gigante de texto que alimenta um modelo... Meio que foda-se eu, né? Vou provar isso como?

searchengineland.com/robots-tx

#LLM #crawlers #web

Last updated 1 year ago

Kevin Karhan :verified: · @kkarhan
1117 followers · 73125 posts · Server mstdn.social

@wonkothesane @vantablack not even close.

can be detected very well and usially blocked or at least throttled down to be less effective than single, human users.

Whereas inevitably necessitates to allow access to huge amounts of data, even if it's just public posts and profiles...

But still, that is valuable that wants to gather and sell to any , as it's more cost-effective than necessitating operator hours.

#targetedsurveillance #surveillancestate #NSA #masssurveillance #Federation #crawlers

Last updated 1 year ago

Music News Feed · @music_news_feed
4 followers · 3684 posts · Server room19.com
Sam Johnson · @8tunesat8
30 followers · 711 posts · Server home.social


Watch "Crawlers - That Time Of Year Always (Official Video)" on YouTube
youtu.be/UdvplNzQDNA

#crawlers #newmusic #8tunesat8

Last updated 2 years ago

· @hertg
73 followers · 116 posts · Server infosec.exchange

I published a login theme on Github with screenshots in the README. I appropriately named the screenshot of the prompt password.jpg and it is the most accessed file of the repository excluding the README.md.

I see you sneaky little .

github.com/hertg/lightdm-neon

#password #hackers #login #theme #lightdm #linux #github #crawlers

Last updated 2 years ago

Music News Feed · @music_news_feed
2 followers · 3989 posts · Server room19.com
Music News Feed · @music_news
5 followers · 2142 posts · Server room19.com
Angus McIntyre · @angusm
511 followers · 483 posts · Server mastodon.social

Unsurprisingly, webmeup's assurance that "you will not see recurring requests from the BLEXBot crawler to the same page" turns out to be ... not true?

At least according to my log files, which show the same page getting hit at 5 day intervals as part of their process of fetching every single page on my site over and over to satisfy some vague marketing need.

So I think BLEXBot can join AHRefsBot and SEMRushBot in my robots.txt. And nothing of value was lost.

#crawlers #webspiders #robotstxt

Last updated 2 years ago

Ben Tasker · @ben
309 followers · 916 posts · Server mastodon.bentasker.co.uk

New : Tightening control over public endpoints

The concern in fediblock around @cloy's plans earlier in the week prompted me to put my hat on and look into ways to make it harder for external to hit Mastodon's API feeds.

This post suggests a possible solution for concerned instance admins as well as details of some I spotted.

bentasker.co.uk/posts/blog/sec

#blog #security #mastodon #api #fedisearch #infosec #scrapers #crawlers

Last updated 2 years ago

Angus McIntyre · @angusm
355 followers · 329 posts · Server mastodon.social

One thing some web crawlers seem to be particularly dumb about is handling 301 Moved Permanently. A lot of bots are still requesting content on my sites that was marked as "moved permanently" or even 410 Gone several years ago ... and they'll be back tomorrow to ask for it again.

This seems inefficient to me, but ¯\_(ツ)_/¯ ...

#bots #crawlers #seo #search #web #webcrawlers #httpstatus #http

Last updated 2 years ago

Angus McIntyre · @angusm
578 followers · 585 posts · Server mastodon.social

One thing some web crawlers seem to be particularly dumb about is handling 301 Moved Permanently. A lot of bots are still requesting content on my sites that was marked as "moved permanently" or even 410 Gone several years ago ... and they'll be back tomorrow to ask for it again.

This seems inefficient to me, but ¯\_(ツ)_/¯ ...

#bots #crawlers #seo #search #web #webcrawlers #httpstatus #http

Last updated 2 years ago

ilove_ur_smile · @ilove_ur_smile
48 followers · 48 posts · Server pouet.chapril.org

Sinon, j'y étais pour le site soundofbrit.fr et j'ai fait quelques photos sympa !
En voici une en exclu 💥
Le reste arrive bientôt sur le site ✨️

#photographie #concert #crawlers

Last updated 2 years ago

Wurmpy · @Wurmpurrrella
4 followers · 14 posts · Server infosec.exchange
thissorryspacesuit · @thissorryspacesuit
275 followers · 111 posts · Server mastodon.art

I have a project called "Welcome to ." This project includes short , and audio all related to the mysterious town of Heirloom and its inhabitants.
Here are riders in the desert outside of Heirloom on the backs of their . Crawlers are a monastic sect of giants who refuse to stand and devote their time to being brute animals of travel- crawling around for the various inhabitants of Heirloom, in hopes of achieving enlightenment.

#writing #sketch #MastoArt #art #crawlers #paintings #stories #heirloom

Last updated 2 years ago

LisPi · @lispi314
71 followers · 1425 posts · Server mastodon.top

Sites that refuse to provide options for reasons such as "resource saving" or strictly limit such options seem to fail to realize that all they're incentivizing is the creation of plain less broken than their instead.

Perhaps they should consider providing adequate options on their terms instead?

Non is a non-option. It's not that hard to use

#download #web #crawlers #api #private #torrent #i2p #media #archival #html #design #crawler

Last updated 3 years ago

@adnan360
Is there an in the about NOINDEX/HASHTAG_INDEX?

That is where we are going here, right?

#rfc #webstandards #hashtagIndex #bots #nobot #crawlers #noindex

Last updated 3 years ago

James Mullarkey · @jamesmullarkey
359 followers · 994 posts · Server w3c.social

peeps:

If I want to block several from my website is it excessive/pointless to use:

1.
2. .txt and
3. <meta name> in the header?

Need to send a clear message to their bots that they ain't welcome.

Belt / braces / another belt approach seems like it's making things crystal.

Thanks :)

#fediverse #google #crawlers #htaccess #robots #bots #surveillancecapitalism

Last updated 6 years ago

🌈 Lascapi · @lascapi
289 followers · 6468 posts · Server mastodon.zaclys.com

Les dévorent la :
...En continuant à utiliser le mot « carte », nous pensons stabilité, immobile alors que les simulés et les crawlers produisent et consomment un ...
ou pas ?
transportsdufutur.ademe.fr/201

#crawlers #carte #mondes #physiques #flux #map #interface #opensource

Last updated 8 years ago