FedSearch - Federated network search engine

Redhotcyber · @redhotcyber

588 followers · 1789 posts · Server mastodon.bida.im

OpenAI rilascia il web crowler GPTBot. Migliorerà la capacità del modello e non violerà il diritto d’autore

#OpenAI ha lanciato il #webcrawler #GPTBot per migliorare i suoi modelli di #intelligenza #artificiale (#AI).

#redhotcyber #online #it #web #ai #hacking #privacy #cybersecurity #cybercrime #intelligence #intelligenzaartificiale #informationsecurity #ethicalhacking #dataprotection #cybersecurityawareness #cybersecuritytraining #cybersecuritynews #infosecurity

https://www.redhotcyber.com/post/openai-rilascia-il-web-crowler-gptbot-migliorera-la-capacita-del-modello-e-non-violera-il-diritto-dautore/

#openai #webcrawler #gptbot #intelligenza #artificiale #ai #redhotcyber #online #it #web #hacking #privacy #cybersecurity #cybercrime #intelligence #intelligenzaartificiale #informationsecurity #ethicalhacking #dataprotection #CyberSecurityAwareness #cybersecuritytraining #CyberSecurityNews #infosecurity

Last updated 2 years ago

Original post

Robert Rothenberg · @rrwo

87 followers · 28 posts · Server floss.social

Then there's the SemRush #WebCrawler that requests robots.txt which politely tells it to f*** off, so it makes other requests from the same netblock but with a user agent claiming to be Googlebot.

(A whois says that the block is owned by SemRush, though DB IP lists it as owned by Google. Interesting.)

Anyhow, it helps to have your website block all requests from specific netblocks, unless they request robots.txt just to be dandy.

#webcrawler

Last updated 2 years ago

Original post

Benjamin Carr, Ph.D. 👨🏻‍💻🧬 · @BenjaminHCCarr

977 followers · 2499 posts · Server hachyderm.io

Open media

Now you can block #OpenAI’s #webcrawler
OpenAI now lets you block its web crawler from scraping your site to help train #GPT models. OpenAI said website operators can specifically disallow its #GPTBot crawler on their site's #Robots.txt file or block its IP address.
https://www.theverge.com/2023/8/7/23823046/openai-data-scrape-block-ai #privacy #security #RobotsTxT

#openai #webcrawler #gpt #gptbot #robots #privacy #security #robotstxt

Last updated 2 years ago

Original post

eicker.news #technews · @technews

84 followers · 786 posts · Server eicker.news

The Verge - Now you can block OpenAI’s web crawler

»Now you can block #OpenAI’s #webcrawler: Internet users can block #GPTBot and keep their site out of #ChatGPT.« https://www.theverge.com/2023/8/7/23823046/openai-data-scrape-block-ai?eicker.news #tech #media

#openai #webcrawler #gptbot #chatgpt #tech #media

Last updated 2 years ago

Original post

CryptoNewsBot · @cryptonewsbot

686 followers · 36280 posts · Server schleuss.online

OpenAI launches web crawler 'GPTBot' amid plans for next model: GPT-5 - ChatGPT users have the option to scrap the web crawler by adding ... - https://cointelegraph.com/news/open-ai-launch-gptbot-web-crawler-amid-gpt5-trademark #u.s.patentandtrademarkoffice #computerfraudandabuseact #privateinformation #worldwideweb #webcrawler #aimodel #paywall #openai #gptbot #policy #gpt-4 #gpt-5

#gpt #policy #gptbot #openai #paywall #aimodel #webcrawler #worldwideweb #privateinformation #computerfraudandabuseact #u

Last updated 2 years ago

Original post

MathDaTech :fedora: 🤘 · @mathdatech1

284 followers · 1052 posts · Server hostux.social

#Shaarli: crul https://www.crul.com/ #webcrawler #webScraping #DataPorn

#shaarli #webcrawler #webscraping #dataporn

Last updated 3 years ago

Original post

postmodern · @postmodern

997 followers · 829 posts · Server ruby.social

Released spidr 0.7.0. Added a `Spidr.domain` method for spidering the domains and any sub-domains.
https://github.com/postmodern/spidr/blob/master/ChangeLog.md#070--2022-12-31
https://github.com/postmodern/spidr
#ruby #webspider #webcrawler #spidering

#ruby #webspider #webcrawler #spidering

Last updated 3 years ago

Original post

Relly Annett-Baker · @RellyAB

246 followers · 282 posts · Server mastodon.social

Web nerds, developers and content modelling types - please go follow @eaton and read about the extraordinary box of tricks he and Autogram have been working on for Web Analysis.

Spidergram is honestly such an exciting tool, every time they showed me a bit of it I could think of a new use case and problem it would help with, and I vibrated in my chair a bit.

#webcrawler #webdev #contentmodel #spidergram

Last updated 3 years ago

Original post

青い暗闇 · @alceawisteria

2 followers · 122 posts · Server koyu.space

#darktimes #webrings

"It may be hard to believe, but there was once a time on the internet before #Google existed. In those dark times, when you wanted to look for something, you had to use a site like #WebCrawler, #Lycos, #AltaVista (unless you live and Pawnee and are still using it), and #yahoo .
Yahoo back then wasn’t so much a search engine, as it was a #phonebook. It was a #hierarchical listing of #websites grouped together by #subject.

http://insufficientscotty.com/2012/03/14/whatever-happened-to-webrings/

#darktimes #webrings #google #webcrawler #lycos #altavista #yahoo #phonebook #hierarchical #websites #subject

Last updated 3 years ago

Original post

Roman · @AlwaysSleepy

10 followers · 9 posts · Server fosstodon.org

#introduction
Hi All,
I'm a Software Developer(mainly #csharp ) who is interested in a rarely used/developed/almost not documented technologies like #bittorent #bencode #onion #webcrawler etc.
Also, sooner or later I'll dive into the #deeplearning and #ai