Angelo Salatino · @angelosalatino
55 followers · 118 posts · Server fediscience.org

RT @BreitingerC
If you've ever extracted information from PDFs, you've probably used a tool like , or

But which tool is best for this job?

My colleague @MeuschkeN ran the tests and is presenting his results at . @iconf

Paper 📰 arxiv.org/abs/2303.09957 twitter.com/MeuschkeN/status/1

#iconf23 #scienceparse #cermine #grobid

Last updated 2 years ago

Andreas Wagner · @anwagnerdreas
665 followers · 1161 posts · Server hcommons.social

@osma Cool, thank you! I had a quick glance and will definitely have a closer look. Do you happen to know if your GPT-3 model had been pretrained with (presumably small volumes of) Finnish texts? But it seems to confirm our intuition that recognition and parsing of such data in texts could probably be quite good, hopefully better than what or presently achieve.

You are of course warmly invited to consider joining us in one way or another. 😃

#grobid #anystyle

Last updated 3 years ago

Osma Suominen · @osma
140 followers · 155 posts · Server sigmoid.social

Has anyone used large language models for extracting ( style, e.g. ) from fulltext (PDF) documents? I tried this with a fine-tuned Curie model and the results were outrageously good at least for doctoral theses. Much better than traditional NLP methods like .

#bibliographic #dublincore #metadata #openai #gpt3 #grobid #ai #machinelearning #llm

Last updated 3 years ago