Leshem Choshen · @LChoshen
1091 followers · 383 posts · Server sigmoid.social

Did you know:
Evaluating a single model on HELM took
⏱️4K GPU hours or 💸+10K$ in API calls?!
Flash-HELM⚡️can reduce costs by X200!
arxiv.org/abs/2308.11696

#deepread #machinelearning #evaluation #eval #nlproc #nlp #llm

Last updated 1 year ago

Leshem Choshen · @LChoshen
1086 followers · 353 posts · Server sigmoid.social

The newFormer is introduced,
but what do we really know about it?

@ari and others
imagine a new large-scale architecture &
ask how would you interptret its abilities and behaviours 🧵
arxiv.org/abs/2308.00189

#deepread #nlproc #machinelearning

Last updated 1 year ago

Leshem Choshen · @LChoshen
1007 followers · 277 posts · Server sigmoid.social

@mega Linear transformations can skip over layers, even till the end

We can see 👀 what the network 🧠 thought!
We can stop🛑 generating at early layers!

arxiv.org/abs/2303.09435v1

#nlproc #deepread

Last updated 2 years ago

Leshem Choshen · @LChoshen
1007 followers · 271 posts · Server sigmoid.social

🔎What's in a layer?🌹🕵🏻‍♀️

Representations are vectors
If only they were words...

Finding:
Any layer can be mapped well to another linearly
Simple, efficient & interpretable
& improves early exit

arxiv.org/abs/2303.09435v1
Story and 🧵

#nlproc #deepread #machinlearning

Last updated 2 years ago

Leshem Choshen · @LChoshen
1004 followers · 260 posts · Server sigmoid.social

Mindblowing pretraining paradigm

Train the same model to predict the two directions separately
Better results, more parallelization

arxiv.org/abs/2303.07295

#deepread #nlproc #pretraining #machinelearning

Last updated 2 years ago

Leshem Choshen · @LChoshen
948 followers · 184 posts · Server sigmoid.social

3 reasons for hallucinations started
only 2 prevailed

Finding how networks behave while hallucinating, they
filter hallucinations (with great success)

arxiv.org/abs/2301.07779

#nlproc #neuralempty #nlp #deepread

Last updated 2 years ago

Leshem Choshen · @LChoshen
756 followers · 114 posts · Server sigmoid.social

What neurons determine agreement in multilingual LLMs?

but some answers:
Across languages-2 distinct ways to encode syntax
Share neurons not info

Autoregressive have dedicated synt. neurons (MLM just spread across)

@amuuueller@twitter.com yu xia @tallinzen@twitter.com

#deepread #conlllivetweet2022

Last updated 2 years ago