▲ | DavidSJ 2 days ago | ||||||||||||||||
> I'm not sure what your point is? I was just responding to this claim: > An LLM was only every meant to be a linguistics model, not a brain or cognitive architecture. Plenty of people did in fact see a language model as a potential path towards intelligence, whatever might be said about the beliefs of Mr. Uszkoreit specifically. There's some ambiguity as to whether you're talking about the transformer specifically, or language models generally. The "recent history" of RNNs and LSTMs you refer to dates back to before the paper I linked. I won't speak to the motivations or views of the specific authors of Vaswani et al, but there's a long history, both distant and recent, of drawing connections between information theory, compression, prediction, and intelligence, including in the context of language modeling. | |||||||||||||||||
▲ | HarHarVeryFunny 2 days ago | parent [-] | ||||||||||||||||
I was really talking about the Transformer specifically. Maybe there was an implicit hope of a better/larger language model leading to new intelligent capabilities, but I've never seen the Transformer designers say they were targeting this or expecting any significant new capabilities even (to their credit) after it was already apparent how capable it was. Neither Google's initial fumbling of the tech or Shazeer's entertainment chatbot foray seem to indicate that they had been targeting, and/or realized they had achieved, a more significant advance than the more efficient seq-2-seq model which had been their proximate goal. To me it seems that the Transformer is really one of industry/science's great accidental discoveries. I don't think it's just the ability to scale that made it so powerful, but more the specifics of the architecture, including the emergent ability to learn "induction heads" which seem core to a lot of what they can do. The Transformer precursors I had in mind were recent ones, in particular Sutskever et als "Sequence to Sequence learning with Neural Networks [LSTM]" from 2014, and Bahdanau et als "Jointly learning to align & translate" from 2016, then followed by the "Attention is all you need" Transformer paper in 2017. | |||||||||||||||||
|