Stumbled upon Trafilatura just now. An amazingly efficient Python lib/tool to extract text from HTML-based pages.
Especially welcomed since Newspaper3k have been abandoned since almost 3 years ago.
#python #textextraction #trafilatura
Per la #pastafresca fatta a casa con le macchine trafilatrici si deve usare la #farina di #grano duro o #semola, che ha le proprietà giuste per resistere al calore della #trafilatura.
Solo con il grano duro ottieni una pasta fresca fatta in casa perfetta. https://www.mangiocongusto.it/perche-usare-la-farina-di-grano-duro-per-la-pasta-trafilata-a-macchina/
#pastafresca #farina #grano #semola #trafilatura
I am #selfhosting my own #python implementation of #rss client/server for years. I also have a simple #tui client. Since the beginning I struggled with stable and functioning #html2text solution for scraping full article content, so it can be shown in console and/or used for offline reading. I started with #newspaper3k, later used #goose3 and now, after 5 years I finally found library that is working 100% for every feed I am subscribed to: #trafilatura
Great job Adrien Barbaresi!
#selfhosting #python #rss #tui #html2text #newspaper3k #goose3 #trafilatura