@dogriley @gallaugher That being said your mileage would vary greatly.
For example, #crawlers can't be banned in Germany if they act with "legitimate interest" [i.e. price comparison systems]...
There was a court case of an airline trying to ban crawlers from accessing their site, and said airline lost against the comparison site.
https://www.internetworld.de/digitaler-handel/rechtstipp/screen-scraping-erlaubt-473348.html
#NotLegalAdvice OFC!
Acho que o ponto mais preocupante do artigo é o que vem logo antes da conclusão: LLMs e crawlers podem simplesmente cagar para o que eu disser do meu conteúdo.
Depois que conteúdo meu entrar no mix gigante de texto que alimenta um modelo... Meio que foda-se eu, né? Vou provar isso como?
https://searchengineland.com/robots-txt-new-meta-tag-llm-ai-429510
@wonkothesane @vantablack not even close.
#Crawlers can be detected very well and usially blocked or at least throttled down to be less effective than single, human users.
Whereas #Federation inevitably necessitates to allow access to huge amounts of data, even if it's just public posts and profiles...
But still, that is valuable #MassSurveillance that #NSA wants to gather and sell to any #SurveillanceState, as it's more cost-effective than #targetedSurveillance necessitating operator hours.
#targetedsurveillance #surveillancestate #NSA #masssurveillance #Federation #crawlers
Crawlers have booked a new London show for June #2023_05_22 #upset #sam_taylor #news #about_to_break #crawlers
>> https://upsetmagazine.com/news/crawlers-london-show-jun23/
#2023_05_22 #upset #sam_taylor #news #about_to_break #crawlers
#Crawlers #NewMusic #8tunesat8
Watch "Crawlers - That Time Of Year Always (Official Video)" on YouTube
https://youtu.be/UdvplNzQDNA
#crawlers #newmusic #8tunesat8
I published a login theme on Github with screenshots in the README. I appropriately named the screenshot of the #password prompt password.jpg
and it is the most accessed file of the repository excluding the README.md
.
I see you sneaky little #hackers.
#password #hackers #login #theme #lightdm #linux #github #crawlers
Crawlers have released a new four-track live EP #2023_02_17 #upset #sam_taylor #news #crawlers
#2023_02_17 #upset #sam_taylor #news #crawlers
About To Break 2023: Crawlers #2023_01_18 #upset #alex_ingle #features #about_to_break #about_to_break_2023 #crawlers #featured
>> https://upsetmagazine.com/features/about-to-break-2023-crawlers/
#2023_01_18 #upset #alex_ingle #features #about_to_break #about_to_break_2023 #crawlers #featured
Unsurprisingly, webmeup's assurance that "you will not see recurring requests from the BLEXBot crawler to the same page" turns out to be ... not true?
At least according to my log files, which show the same page getting hit at 5 day intervals as part of their process of fetching every single page on my site over and over to satisfy some vague marketing need.
So I think BLEXBot can join AHRefsBot and SEMRushBot in my robots.txt. And nothing of value was lost.
#crawlers #webspiders #robotstxt
New #blog: Tightening #security control over #mastodon public #api endpoints
The concern in fediblock around @cloy's #fedisearch plans earlier in the week prompted me to put my #infosec hat on and look into ways to make it harder for external #scrapers to hit Mastodon's API feeds.
This post suggests a possible solution for concerned instance admins as well as details of some #crawlers I spotted.
#blog #security #mastodon #api #fedisearch #infosec #scrapers #crawlers
One thing some web crawlers seem to be particularly dumb about is handling 301 Moved Permanently. A lot of bots are still requesting content on my sites that was marked as "moved permanently" or even 410 Gone several years ago ... and they'll be back tomorrow to ask for it again.
This seems inefficient to me, but ¯\_(ツ)_/¯ ...
#bots #crawlers #seo #search #web #webcrawlers #httpstatus #http
#bots #crawlers #seo #search #web #webcrawlers #httpstatus #http
One thing some web crawlers seem to be particularly dumb about is handling 301 Moved Permanently. A lot of bots are still requesting content on my sites that was marked as "moved permanently" or even 410 Gone several years ago ... and they'll be back tomorrow to ask for it again.
This seems inefficient to me, but ¯\_(ツ)_/¯ ...
#bots #crawlers #seo #search #web #webcrawlers #httpstatus #http
#bots #crawlers #seo #search #web #webcrawlers #httpstatus #http
Sinon, j'y étais pour le site soundofbrit.fr et j'ai fait quelques photos sympa !
En voici une en exclu 💥
Le reste arrive bientôt sur le site ✨️
#Crawlers #Concert #Photographie
#photographie #concert #crawlers
I have a project called "Welcome to #Heirloom." This project includes short #stories, #paintings and audio all related to the mysterious town of Heirloom and its inhabitants.
Here are riders in the desert outside of Heirloom on the backs of their #crawlers. Crawlers are a monastic sect of giants who refuse to stand and devote their time to being brute animals of travel- crawling around for the various inhabitants of Heirloom, in hopes of achieving enlightenment.
#art #MastoArt #sketch #writing
#writing #sketch #MastoArt #art #crawlers #paintings #stories #heirloom
Sites that refuse to provide #download options for reasons such as "resource saving" or strictly limit such options seem to fail to realize that all they're incentivizing is the creation of plain #web #crawlers less broken than their #api instead.
Perhaps they should consider providing adequate options on their terms instead?
Non #private #torrent is a non-option. It's not that hard to use #i2p
#download #web #crawlers #api #private #torrent #i2p #media #archival #html #design #crawler
@adnan360
Is there an #RFC in the #webStandards about NOINDEX/HASHTAG_INDEX?
That is where we are going here, right?
#rfc #webstandards #hashtagIndex #bots #nobot #crawlers #noindex
#Fediverse peeps:
If I want to block several #Google #crawlers from my website is it excessive/pointless to use:
1. #htaccess
2. #robots.txt and
3. <meta name> in the header?
Need to send a clear message to their bots that they ain't welcome.
Belt / braces / another belt approach seems like it's making things crystal. #bots
Thanks :)
#fediverse #google #crawlers #htaccess #robots #bots #surveillancecapitalism
Les #crawlers dévorent la #carte :
...En continuant à utiliser le mot « carte », nous pensons stabilité, immobile alors que les #mondes #physiques simulés et les crawlers produisent et consomment un #flux...
#map #interface #opensource ou pas ?
http://transportsdufutur.ademe.fr/2017/06/crawlers-devorent-carte.html
#crawlers #carte #mondes #physiques #flux #map #interface #opensource