FedSearch - Federated network search engine

FedSearch

Leshem Choshen · @LChoshen

957 followers · 214 posts · Server sigmoid.social

There are three tracks. Two of them require you to use a small training corpus (that we provide) inspired by the input to children. One of them loosens the restrictions: you can pre-train on a small natural language dataset of your choosing, and use unlimited non-linguistic data.

Interested? The training datasets are already out! Evaluation pipeline to come soon!

Call for Papers: https://arxiv.org/abs/2301.11796

Website: http://babylm.github.io

#NLP #nlproc #pretraining #pretrain #babyLM