FedSearch - Federated network search engine

FedSearch

Christoffer S. · @nopatience

1393 followers · 445 posts · Server swecyb.com

Stumbled upon Trafilatura just now. An amazingly efficient Python lib/tool to extract text from HTML-based pages.

Especially welcomed since Newspaper3k have been abandoned since almost 3 years ago.

#Python #TextExtraction #Trafilatura

#python #textextraction #trafilatura

Last updated 2 years ago

Original post

Bloggo e sto · @Blogsdaseguire

6 followers · 313 posts · Server mastodon.cloud

Per la #pastafresca fatta a casa con le macchine trafilatrici si deve usare la #farina di #grano duro o #semola, che ha le proprietà giuste per resistere al calore della #trafilatura.
Solo con il grano duro ottieni una pasta fresca fatta in casa perfetta. https://www.mangiocongusto.it/perche-usare-la-farina-di-grano-duro-per-la-pasta-trafilata-a-macchina/

#pastafresca #farina #grano #semola #trafilatura

Last updated 2 years ago

Original post

Marian :openbsd: :gentoo: · @marian_mizik

102 followers · 182 posts · Server fosstodon.org

I am #selfhosting my own #python implementation of #rss client/server for years. I also have a simple #tui client. Since the beginning I struggled with stable and functioning #html2text solution for scraping full article content, so it can be shown in console and/or used for offline reading. I started with #newspaper3k, later used #goose3 and now, after 5 years I finally found library that is working 100% for every feed I am subscribed to: #trafilatura

Great job Adrien Barbaresi!

#selfhosting #python #rss #tui #html2text #newspaper3k #goose3 #trafilatura

Last updated 3 years ago

Original post