Nayla Salibi · @Salibi
202 followers · 587 posts · Server social.tchncs.de

تمنع مواقع ‫‬ وصول ‪‬، الذي طورته، ‪‬ لجمع البيانات، بحجة تحسين "دقة نماذج ‫‬" التي تطورها. لماذا قلق هذه المواقع من حصد بياناتها؟ وكيف يمكن حظر ‫‬ من حصد بيانات مواقع الويب؟مع تحيات ‫
‏‪
‏‪

mc-d.co/1uQu

#ويب #gptbot #openai #الذكاء_الاصطناعي #الروبوت #نايلةالصليبي #ai #web_crawler

Last updated 1 year ago

Redhotcyber · @redhotcyber
588 followers · 1789 posts · Server mastodon.bida.im
Ian Brown :fedi: · @1br0wn
2175 followers · 2094 posts · Server eupolicy.social
beSpacific · @bespacific
1106 followers · 2068 posts · Server newsie.social

The has blocked ’s , meaning that OpenAI can’t use content from the publication to train its AI models. If you check the NYT’s robots.txt page, you can see that the NYT disallows , the crawler that OpenAI introduced earlier this month. Based on the ’s , it appears NYT blocked the crawler as early as August 17th. theverge.com/2023/8/21/2384070

#newyorktimes #openai #WebCrawler #gptbot #internetarchive #waybackmachine #copyright #legalresearch

Last updated 1 year ago

· @shaun
223 followers · 1189 posts · Server mastodon.xyz

Persistent little fuckers, aren't they.

#openai #gptbot #chatgpt

Last updated 1 year ago

PrivacyDigest · @PrivacyDigest
541 followers · 2002 posts · Server mas.to

Sites scramble to block web after instructions emerge

Without announcement, recently added details about its web crawler, , to its online documentation site.

arstechnica.com/?p=1960108

#privacy #gptbot #openai #crawler #chatgpt

Last updated 1 year ago

Paul Chambers · @paul
1828 followers · 9169 posts · Server oldfriends.live

IP block ranges if you want to block them from your instance and scraping your content. I saw Mastodon devs added something to block via robots.txt a few days ago. Here are the IP ranges:

20.15.240.64/28
20.15.240.80/28
20.15.240.96/28
20.15.240.176/28
20.15.241.0/28
20.15.242.128/28
20.15.242.144/28
20.15.242.192/28
40.83.2.64/28

openai.com/gptbot-ranges.txt

theverge.com/2023/8/7/23823046

github.com/mastodon/mastodon/p

#openai #gptbot #mastoadmin #fediblock

Last updated 1 year ago

bananabob · @bananabob
75 followers · 1781 posts · Server mastodon.nz
IT News · @itnewsbot
3605 followers · 270022 posts · Server schleuss.online

Sites scramble to block ChatGPT web crawler after instructions emerge - Enlarge (credit: Getty Images)

Without announcement, OpenAI re... - arstechnica.com/?p=1960108

#ai #tech #openai #gptbot #biz #chatgtp #chatgpt #aiethics #webcrawling #webscraming #machinelearning

Last updated 1 year ago

Tech news from Canada · @TechNews
929 followers · 24992 posts · Server mastodon.roitsystems.ca
@gmgall NFLizado das ideias 🏈 · @gmgall
233 followers · 2453 posts · Server ursal.zone

Administradores de sistemas e redes da minha timeline, as faixas de IP do são as seguintes.

20.15.240.64/28
20.15.240.80/28
20.15.240.96/28
20.15.240.176/28
20.15.241.0/28
20.15.242.128/28
20.15.242.144/28
20.15.242.192/28
40.83.2.64/28

Happy Blocking!

É citado na última frase daqui platform.openai.com/docs/gptbo, pode passar despercebido.

#gptbot #openai #chatgpt #ia #ai

Last updated 1 year ago

Nathaniel Daught · @nfd
184 followers · 1203 posts · Server masto.ai

Can I even customize my robots.txt on Squarespace to stop GPTBot from crawling my site? (Probably not) It's an easy and convenient hosting/builder solution until it's not…

platform.openai.com/docs/gptbo

#ai #squarespace #gptbot

Last updated 1 year ago

Stu · @tehstu
224 followers · 1637 posts · Server hachyderm.io

There's a ZDnet article advising how to do this, but here is a direct link to the instructions for preventing 's new from crawling your site by creating/adding to your site's robots file:

platform.openai.com/docs/gptbo

#openai #gptbot

Last updated 1 year ago

Simon D. ⏚ · @Siltaer
1086 followers · 11270 posts · Server mamot.fr

OpenAI lance son web crawler, appelle les médias à le bloquer
nextinpact.com/article/72216/o
Avec , lance un web crawler dédié à récupérer des données « depuis tout Internet », quand bien même les plaintes pour infraction à la vie privée et au droit d'auteur se multiplient contre les différents déployés sur le marché.

#rsf #gptbot #openai #llm #chatgpt

Last updated 1 year ago

LWFlouisa · @lwflouisa
5 followers · 266 posts · Server comics.town

But for those who don't know, came out with a way to block access to .

I'd encourage all github users to utilize robots.txt

#openai #gptbot

Last updated 1 year ago

Chez Juju (secours) · @Juste_Juju
1 followers · 11 posts · Server ludosphere.fr

Dans un communiqué, invite les médias à configurer leurs sites d'information pour empêcher / de récolter leurs contenus.

« Les doivent être rétribués pour leur travail d'intérêt général dont les mastodontes de la voudraient tirer profits à bons comptes. »

Espérons que cela sera suivi !

nitter.lacontrevoie.fr/RSF_Tec

#rsf #openai #chatgpt #medias #tech #gptbot

Last updated 1 year ago

Gianmarco :archlinux: :kde: · @gianmarcogg03
295 followers · 3072 posts · Server mastodon.uno

It may be a good idea to block the crawler from your websites with something like this (if you use ):

if ($http_user_agent ~ (GPTBot) ) {
return 403;
}

rather than using the robots.txt file since who knows if they really respect it.

#gptbot #nginx #ai #openai #chatgpt #GPT4 #gpt5 #copyright #creativecommons

Last updated 1 year ago

ChatGPTroll 🥔 · @Troll
3454 followers · 69425 posts · Server maly.io

Si vous voulez empêcher le robot d'indexation d'OpenAI de scanner votre site web et d'entraîner leur modèle avec votre contenu:

🤖

platform.openai.com/docs/gptbo

#gptbot #openai

Last updated 1 year ago

Now you can block ’s
OpenAI now lets you block its web crawler from scraping your site to help train models. OpenAI said website operators can specifically disallow its crawler on their site's .txt file or block its IP address.
theverge.com/2023/8/7/23823046

#openai #webcrawler #gpt #gptbot #robots #privacy #security #robotstxt

Last updated 1 year ago

· @mistersixt
67 followers · 1337 posts · Server kanoa.de

#OpenAI #gptbot

Last updated 1 year ago