I specialize in software performance and scalability.
To date I'm aware of about 150 techniques and patterns for achieving that. They are all in my own toolbox. (I hope?)
One of them... is request throttling.
#Twitter
#performance
#scalability
#scaling
#scale
#scraping
#scrapers
#spiders
#bots
#programming
#software
#DDoS
#SRE
#Internet
#DistributedSystems
#HPC
#web
#throttling
#RateLimits
#scrapers #spiders #twitter #performance #scalability #scaling #programming #software #ddos #sre #internet #distributedsystems #hpc #web #throttling #ratelimits #scale #scraping #bots
Hello @communitydata!
Under Article 15 of the UK General Data Protection Regulation (#GDPR) I hereby submit a Subject Access Request for ANY and ALL personally identifiable information (PII) you hold on me, which includes (but is not limited to) my usernames (@aendra, formerly @aendra@4estate.media) or IP addresses. Please send any relevant data as ideally a CSV or JSON file, to either @aendra on Keybase, or by emailing an archive encrypted using the public key linked on my profile to data[at]aendra.com. Please also report how that data was originally acquired.
Upon receipt of this message, you have one calendar month to comply as per your statutory duties, regardless of where in the world you or your servers are situated. This is just how #GDPR works, I don't make the rules.
Thank you, have a nice day.
New #blog: Autodetecting and Announcing #Mastodon Scrapers and Crawlers
There've been quite a few #fedisearch issues recently, but the common thread is that there's usually a gap in reporting - they're often live for weeks before people are made aware.
It's not just people's pet projects either, there are other #scrapers active, quietly consuming posts
So, I built a bot to detect and out them so that fedi admins can block as necessary
#blog #mastodon #fedisearch #scrapers #infosec #security
New #blog: Tightening #security control over #mastodon public #api endpoints
The concern in fediblock around @cloy's #fedisearch plans earlier in the week prompted me to put my #infosec hat on and look into ways to make it harder for external #scrapers to hit Mastodon's API feeds.
This post suggests a possible solution for concerned instance admins as well as details of some #crawlers I spotted.
#blog #security #mastodon #api #fedisearch #infosec #scrapers #crawlers
Seeing more and more people building #scrapers for the #fediverse, I honestly think this is a bad direction we are heading to. I am not worried about the scraping per se, ignoring that someone has visibility features turned off and not synchronizing deletes, though. That's definitely a problem.
#scrapers #fediverse #scraping #mastodon
Katana is a crawling and spidering framework
#WebCrawlers #tools #scrapers #apps
SiteSucker is a Macintosh application that automatically downloads websites from the Internet.
It does this by asynchronously copying the site's webpages, images, PDFs, style sheets, and other files to your local hard drive, duplicating the site's directory structure.
#dowloaders #scrapers #archive #tools #backup #apps #macos