Redhotcyber · @redhotcyber
588 followers · 1789 posts · Server mastodon.bida.im
Robert Rothenberg · @rrwo
87 followers · 28 posts · Server floss.social

Then there's the SemRush that requests robots.txt which politely tells it to f*** off, so it makes other requests from the same netblock but with a user agent claiming to be Googlebot.

(A whois says that the block is owned by SemRush, though DB IP lists it as owned by Google. Interesting.)

Anyhow, it helps to have your website block all requests from specific netblocks, unless they request robots.txt just to be dandy.

#webcrawler

Last updated 1 year ago

Now you can block ’s
OpenAI now lets you block its web crawler from scraping your site to help train models. OpenAI said website operators can specifically disallow its crawler on their site's .txt file or block its IP address.
theverge.com/2023/8/7/23823046

#openai #webcrawler #gpt #gptbot #robots #privacy #security #robotstxt

Last updated 1 year ago

eicker.news #technews · @technews
84 followers · 786 posts · Server eicker.news
CryptoNewsBot · @cryptonewsbot
686 followers · 36280 posts · Server schleuss.online

OpenAI launches web crawler 'GPTBot' amid plans for next model: GPT-5 - ChatGPT users have the option to scrap the web crawler by adding ... - cointelegraph.com/news/open-ai .s.patentandtrademarkoffice -4 -5

#gpt #policy #gptbot #openai #paywall #aimodel #webcrawler #worldwideweb #privateinformation #computerfraudandabuseact #u

Last updated 1 year ago

MathDaTech :fedora: 🤘 · @mathdatech1
284 followers · 1052 posts · Server hostux.social
postmodern · @postmodern
997 followers · 829 posts · Server ruby.social

Released spidr 0.7.0. Added a `Spidr.domain` method for spidering the domains and any sub-domains.
github.com/postmodern/spidr/bl
github.com/postmodern/spidr

#ruby #webspider #webcrawler #spidering

Last updated 2 years ago

Relly Annett-Baker · @RellyAB
246 followers · 282 posts · Server mastodon.social

Web nerds, developers and content modelling types - please go follow @eaton and read about the extraordinary box of tricks he and Autogram have been working on for Web Analysis.

Spidergram is honestly such an exciting tool, every time they showed me a bit of it I could think of a new use case and problem it would help with, and I vibrated in my chair a bit.

#webcrawler #webdev #contentmodel #spidergram

Last updated 2 years ago

青い暗闇 · @alceawisteria
2 followers · 122 posts · Server koyu.space

"It may be hard to believe, but there was once a time on the internet before existed. In those dark times, when you wanted to look for something, you had to use a site like , , (unless you live and Pawnee and are still using it), and .
Yahoo back then wasn’t so much a search engine, as it was a . It was a listing of grouped together by

insufficientscotty.com/2012/03

#darktimes #webrings #google #webcrawler #lycos #altavista #yahoo #phonebook #hierarchical #websites #subject

Last updated 2 years ago

Roman · @AlwaysSleepy
10 followers · 9 posts · Server fosstodon.org


Hi All,
I'm a Software Developer(mainly ) who is interested in a rarely used/developed/almost not documented technologies like etc.
Also, sooner or later I'll dive into the and

I'm currently working on my custom client for windows and then for just for self-educational purposes

Thanks,
Roman

#introduction #csharp #bittorent #bencode #onion #webcrawler #deeplearning #ai #cli #torrent #linux

Last updated 2 years ago

Holger Behrens · @hbrns
76 followers · 169 posts · Server fosstodon.org

Replace with 😂

#webcrawler #Webscraper

Last updated 2 years ago

Holger Behrens · @hbrns
76 followers · 169 posts · Server fosstodon.org

wifey asked for an app to find in the local supermarket.

Challenge accepted.

Wrote a first using and . Next learned . And .

Installed the first version on her phone yesterday.

She tried it. Found a bargain she wanted. Told her to double check online with the store. ... She could not find it there!?!

I checked. Bargain will only be available for three days next week in a local store. The bargain isn't offered at main branch… 1/2

#android #bargains #webcrawler #playwright #python #flutter #adb

Last updated 2 years ago

RA Michael Seidlitz · @ramichaelseidlitz
713 followers · 5393 posts · Server mastodon.cloud



nur durch / ?

Alle Informationen zur aktuellen Google Fonts Abmahnwelle
von RA Peter Harlander

marketingrecht.eu/google-fonts

Datenschutzverletzung wegen Google Fonts: Datenschutzanwalt versendet Abmahnungen
von David Wurm

techniknews.net/news/datenschu

#google_fonts #Logfile_Analyse #Website_Besuch #webcrawler #bot

Last updated 2 years ago

alexanderadam · @alexanderadam
301 followers · 3369 posts · Server ruby.social

Another web crawler for @CrystalLanguage@twitter.com:

github.com/grkek/anonymous

This based on 's Crawly.

#crystallang #webcrawler #elixirlang

Last updated 3 years ago

Tarnkappe.info · @tarnkappeinfo
1532 followers · 3788 posts · Server social.tchncs.de