Alex Strick van Linschoten · @strickvl
240 followers · 122 posts · Server mathstodon.xyz

🔠 I wrote up some of what I've learned about tokenisation (with examples using Balochi). This is more of a high-level overview that tackles why we tokenise words, what options are available to us and what tradeoffs we assume by choosing one option over another.

mlops.systems/posts/2023-06-01

#nlp #languageModels #Balochi

Last updated 1 year ago

Alex Strick van Linschoten · @strickvl
240 followers · 120 posts · Server mathstodon.xyz

I wrote about my first steps moving forward in my Balochi language modelling project. Training a custom tokenizer is my initial short-term goal but to do that I first needed to put together a small dataset with which I could work. I detail some of the things I did to that end and a list of resources I'm maintaining as I continue on this journey.

mlops.systems/posts/2023-05-29

#lowresource #nlp #Balochi

Last updated 1 year ago

Alex Strick van Linschoten · @strickvl
240 followers · 120 posts · Server mathstodon.xyz

Taking the next few months to work on language modelling techniques for low-resource languages. I'll be working with as spoken in southeastern for which there aren't many datasets or resources available (at first glance).

mlops.systems/posts/2023-05-21

#iran #Balochi

Last updated 1 year ago

Alireza Dehbozorgi · @BDehbozorgi83
73 followers · 334 posts · Server mastodon.social

A - bilingual comparative and/or contrastive short video. Fun to watch. The text is from the Bible.

youtube.com/watch?v=jYkxht6rOy

#persian #Balochi

Last updated 2 years ago

Alessandro M · @Nighthawk
14 followers · 861 posts · Server mastodon.uno

RT @MarianoGiustino
Punto su :
- la è entrata in una fase di non ritorno
- si estende alle aree anche più conservatrici e storicamente sostenitrici del regime
- Colpiti i simboli più sacri della Repubblica islamica
- Nelle aree curde e è un inferno
@RadioRadicale

#turchia #Balochi #rivoluzione #iran

Last updated 2 years ago