Remix.run Logo
gnerd00 4 days ago

> Transformer model just trivially blowing everything else out of the water

no, this is the winners rewriting history. Transformer style encoders are now applied to lots and lots of disciplines but they do not "trivially" do anything. The hype re-telling is obscuring the facts of history. Specifically in human language text translation, "Attention is All You Need" Transformers did "blow others out of the water" yes, for that application.

arugulum 3 days ago | parent [-]

My statement was

>a (fine-tuned) base Transformer model just trivially blowing everything else out of the water

"Attention is All You Need" was a Transformer model trained specifically for translation, blowing all other translation models out of the water. It was not fine-tuned for tasks other than what the model was trained from scratch for.

GPT-1/BERT were significant because they showed that you can pretrain one base model and use it for "everything".