Giulio · @giuliohome
19 followers · 345 posts · Server mastodon.world

On top of and , are and the secrets behind ? interesting that they are simple low level parallelism primitives independent from compilers and pipeline models

twitter.com/ItakGol/status/165

#transformers #selfattention #chatgpt #tokens #embeddings

Last updated 2 years ago

Lynd Bacon · @lyndbacon
1 followers · 6 posts · Server masto.ai

Maybe "attention" as used in common transformer models isn't all you need, or or need at all. Microsoft researchers describe "focal modulation networks" that aid interpretation of image processing:

arxiv.org/abs/2203.11926



#transformer #computervision #selfattention

Last updated 2 years ago

DavΞ MacDonald (admin) · @dave
383 followers · 38 posts · Server mastodon.solar