▲ | movpasd 6 days ago | |
That's also my view. It's clear that these models are more than pure language algorithms. Somewhere within the hidden layers are real, effective working models of how the world works. But the power of real humans is the ability to learn on-the-fly. Disclaimer: These are my not-terribly-informed layperson's thoughts :^) The attention mechanism does seem to give us a certain adaptability (especially in the context of research showing chain-of-thought "hidden reasoning") but I'm not sure that it's enough. Thing is, earlier language models used recurrent units that would be able to store intermediate data, which would give more of a foothold for these kind of on-the-fly adjustments. And here is where the theory hits the brick wall of engineering. Transformers are not just a pure machine learning innovation, the key is that they are massively scalable, and my understand is part of this comes from the _lack_ of recurrence. I guess this is where the interest in foundation models comes from. If you could take a codebase as a whole and turn it into effective training data to adjust the weights of an existing, more broadly-trained model, But is this possible with a single codebase's worth of data? Here again we see the power of human intelligence at work: the ability to quite consciously develop new mental models even given very little data. I imagine this is made possible by leaning on very general internal world-models that let us predict the outcomes of even quite complex unseen ("out-of-distribution") situations, and that gives us extra data. It's what we experience as the frustrations and difficulties of the learning process. |