Leshem Choshen · @LChoshen
1040 followers · 293 posts · Server sigmoid.social

TinyStories: Tiny models are coherent and understand instructions
If their data is very simple

What is simple?
What 3-4 year old vocabularies allow (according to LLMs...)

arxiv.org/abs/2305.07759

#nlproc #llm #babylm

Last updated 1 year ago

Leshem Choshen · @LChoshen
1040 followers · 293 posts · Server sigmoid.social

TinyStories: Tiny models are coherent and understand instructions
If their data is very simple

What is simple?
What 3-4 year old vocabularies allow (according to LLMs...)

arxiv.org/abs/2305.07759

#nlproc #llm #babylm

Last updated 1 year ago

Leshem Choshen · @LChoshen
957 followers · 214 posts · Server sigmoid.social

There are three tracks. Two of them require you to use a small training corpus (that we provide) inspired by the input to children. One of them loosens the restrictions: you can pre-train on a small natural language dataset of your choosing, and use unlimited non-linguistic data.

Interested? The training datasets are already out! Evaluation pipeline to come soon!

Call for Papers: arxiv.org/abs/2301.11796

Website: babylm.github.io

#nlp #nlproc #pretraining #pretrain #babylm

Last updated 2 years ago