Remix.run Logo
riku_iki a day ago

Post starts with wrong statement right away:

"The Transformer architecture revolutionized sequence modeling with its introduction of attention"

Attention was developed before transformers.

Alifatisk a day ago | parent [-]

> Attention was developed before transformers.

I just looked this up and it’s true, this changes the timeline I had in my mind completely! I thought the paper on Transformers is what also introduced the attention mechanism, but it existed before too and was applied on RNN encoder-decoder. Wow

logicchains a day ago | parent [-]

Knowing how such things go, it was probably invented by Schmidhuber in the 90s.

esafak a day ago | parent [-]

https://people.idsia.ch/~juergen/1991-unnormalized-linear-tr...

cubefox 15 hours ago | parent [-]

Of course.