could those with #HTR / #OCR ground truth for #Paleoslavonic and #Cyrillic please raise their hands and point me their data-repositories so we could train some open source #eScriptorium cum #kraken models?
#kraken #escriptorium #cyrillic #paleoslavonic #ocr #HTR
We just finished an intensive introduction into #eScriptorium, the #opensource platform #OCR / #HTR at the PSL-week with lovely projects in Latin, French, German, Chinese, Vietnamese, ...
Lovely to discover a couple of new open generalized models trained by others:
#Latin #manuscripts and #incunabula 8th to 15th century.
https://zenodo.org/record/7234166#.Y4MCGuHMKpp
#French 16th to 21st century: https://zenodo.org/record/6657809#.Y4MCfuHMKpo
Finetuning on these will lead to quick success. Especially with the new PASSIM txt2txt alignment.
#french #incunabula #manuscripts #latin #HTR #ocr #opensource #escriptorium
RT @TorstenHiltmann@twitter.com
Was können Verfahren des Maschinellen Lernens #ML, #HTR, #NER, #computerVision etc. für die Analyse tausender Urkundenregister und Stundenbücher mit Blick auf Text, Schrift & Bild aktuell leisten? Mittwoch gibt's Auskunft!
https://dhistory.hypotheses.org/2574
#DigitalHistory #medievaltwitter
🐦🔗: https://twitter.com/TorstenHiltmann/status/1587117600710037505
#medievaltwitter #digitalhistory #computervision #ner #HTR #ml
Die Plattform hätte einige Tools schon dabei: generische #OCR, #HTR und #Spracherkennung, erweiterbare Textextraktoren auf Pluginbasis, #DoubleKeying, eine leistungsfähige #API, Browser-Addons, Website-Widgets, mobile Apps etc. Quasi ein Transkriptionslayer auf das offene Web.
#ocr #HTR #spracherkennung #DoubleKeying #api