I am #selfhosting my own #python implementation of #rss client/server for years. I also have a simple #tui client. Since the beginning I struggled with stable and functioning #html2text solution for scraping full article content, so it can be shown in console and/or used for offline reading. I started with #newspaper3k, later used #goose3 and now, after 5 years I finally found library that is working 100% for every feed I am subscribed to: #trafilatura
Great job Adrien Barbaresi!
#selfhosting #python #rss #tui #html2text #newspaper3k #goose3 #trafilatura
Ja he alliberat el canvi del 'html parser' de #mastotuit, de la llibreria #BeatifulSoup a #html2text:
https://gitlab.com/spla/mastotuit/-/commit/b1b1718b67e282734686b5faad5cf6e5eef182bc
#MASTOTUIT #BeatifulSoup #html2text
Absolutament genial!
Crec que ja puc dir adéu amb la ma oberta a la llibreria #BeatifulSoup 👋
#html2text de #AaronSwartz funciona millor!
#BeatifulSoup #html2text #aaronswartz
El que encara no he provat és com es comporta amb els enllaços la configuració que li he posat a #html2text de #mastotuit.
A veure amb aquest:
He modificat #mastotuit per a que faci servir la llibreria #html2text de #AaronSwartz.
Aquesta publicació és la primera en fer servir la llibreria del mític Aaron Swartz.
#MASTOTUIT #html2text #aaronswartz
@brainblasted There are also several #html2text programs that could be helpful. Here's one I've used before (written in Python): https://github.com/aaronsw/html2text