synlogic · @synlogic
88 followers · 1547 posts · Server toot.io

I specialize in software performance and scalability.

To date I'm aware of about 150 techniques and patterns for achieving that. They are all in my own toolbox. (I hope?)

One of them... is request throttling.



















#scrapers #spiders #twitter #performance #scalability #scaling #programming #software #ddos #sre #internet #distributedsystems #hpc #web #throttling #ratelimits #scale #scraping #bots

Last updated 2 years ago

ændra · @aendra
-1 followers · 1161 posts · Server hackers.town

Hello @communitydata!

Under Article 15 of the UK General Data Protection Regulation () I hereby submit a Subject Access Request for ANY and ALL personally identifiable information (PII) you hold on me, which includes (but is not limited to) my usernames (@aendra, formerly @aendra@4estate.media) or IP addresses. Please send any relevant data as ideally a CSV or JSON file, to either @aendra on Keybase, or by emailing an archive encrypted using the public key linked on my profile to data[at]aendra.com. Please also report how that data was originally acquired.

Upon receipt of this message, you have one calendar month to comply as per your statutory duties, regardless of where in the world you or your servers are situated. This is just how works, I don't make the rules.

Thank you, have a nice day.

#scrapers #gdpr

Last updated 2 years ago

Ben Tasker · @ben
347 followers · 1050 posts · Server mastodon.bentasker.co.uk

New : Autodetecting and Announcing Scrapers and Crawlers

There've been quite a few issues recently, but the common thread is that there's usually a gap in reporting - they're often live for weeks before people are made aware.

It's not just people's pet projects either, there are other active, quietly consuming posts

So, I built a bot to detect and out them so that fedi admins can block as necessary

bentasker.co.uk/posts/blog/sec

#blog #mastodon #fedisearch #scrapers #infosec #security

Last updated 3 years ago

Ben Tasker · @ben
309 followers · 916 posts · Server mastodon.bentasker.co.uk

New : Tightening control over public endpoints

The concern in fediblock around @cloy's plans earlier in the week prompted me to put my hat on and look into ways to make it harder for external to hit Mastodon's API feeds.

This post suggests a possible solution for concerned instance admins as well as details of some I spotted.

bentasker.co.uk/posts/blog/sec

#blog #security #mastodon #api #fedisearch #infosec #scrapers #crawlers

Last updated 3 years ago

Marlin · @marlin
56 followers · 37 posts · Server haminoa.net

Seeing more and more people building for the , I honestly think this is a bad direction we are heading to. I am not worried about the scraping per se, ignoring that someone has visibility features turned off and not synchronizing deletes, though. That's definitely a problem.

#scrapers #fediverse #scraping #mastodon

Last updated 3 years ago

Volkan Özçelik · @volkan
16 followers · 821 posts · Server z2h.dev
Volkan Özçelik · @volkan
16 followers · 821 posts · Server z2h.dev

SiteSucker is a Macintosh application that automatically downloads websites from the Internet.

It does this by asynchronously copying the site's webpages, images, PDFs, style sheets, and other files to your local hard drive, duplicating the site's directory structure.

ricks-apps.com/osx/sitesucker/

#dowloaders #scrapers #archive #tools #backup #apps #macos

Last updated 3 years ago