Tech news from Canada · @TechNews
988 followers · 27006 posts · Server mastodon.roitsystems.ca
Paul R. Pival (he/him) · @ppival
143 followers · 284 posts · Server glammr.us

Seems like a good idea, and should've been in place from the get-go.

OpenAI launches webcrawler GPTBot, and instructions on how to block it mashable.com/article/open-ai-g

#openai #webcrawlers

Last updated 1 year ago

Angus McIntyre · @angusm
578 followers · 585 posts · Server mastodon.social

What the actual fuck?

Will someone kindly explain to "global cybersecurity leader" Palo Alto Networks that the User-Agent header is a place to put the name of your user agent? You send the name of your user agent, and you obey `robots.txt` (which they don't, of course). You DO NOT write a short essay ending with a request for people to mail you to opt-out. It is 2023 and the right way to do this was established DECADES ago.

#paloaltonetworks #clownshoes #robotstxt #webcrawlers #www #web

Last updated 2 years ago

Angus McIntyre · @angusm
355 followers · 329 posts · Server mastodon.social

One thing some web crawlers seem to be particularly dumb about is handling 301 Moved Permanently. A lot of bots are still requesting content on my sites that was marked as "moved permanently" or even 410 Gone several years ago ... and they'll be back tomorrow to ask for it again.

This seems inefficient to me, but ¯\_(ツ)_/¯ ...

#bots #crawlers #seo #search #web #webcrawlers #httpstatus #http

Last updated 2 years ago

Angus McIntyre · @angusm
578 followers · 585 posts · Server mastodon.social

One thing some web crawlers seem to be particularly dumb about is handling 301 Moved Permanently. A lot of bots are still requesting content on my sites that was marked as "moved permanently" or even 410 Gone several years ago ... and they'll be back tomorrow to ask for it again.

This seems inefficient to me, but ¯\_(ツ)_/¯ ...

#bots #crawlers #seo #search #web #webcrawlers #httpstatus #http

Last updated 2 years ago