| ▲ | riku_iki a day ago | |||||||||||||||||||||||||
Post starts with wrong statement right away: "The Transformer architecture revolutionized sequence modeling with its introduction of attention" Attention was developed before transformers. | ||||||||||||||||||||||||||
| ▲ | Alifatisk a day ago | parent [-] | |||||||||||||||||||||||||
> Attention was developed before transformers. I just looked this up and it’s true, this changes the timeline I had in my mind completely! I thought the paper on Transformers is what also introduced the attention mechanism, but it existed before too and was applied on RNN encoder-decoder. Wow | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||