Christoffer S. · @nopatience
1393 followers · 445 posts · Server swecyb.com

Stumbled upon Trafilatura just now. An amazingly efficient Python lib/tool to extract text from HTML-based pages.

Especially welcomed since Newspaper3k have been abandoned since almost 3 years ago.

#python #textextraction #trafilatura

Last updated 1 year ago