▲ | didibus 2 days ago | |
If you observe the failure modes of current models, you see that they fail in ways that align with probabilistic token prediction. I don't mean that the textual prediction is simple, it's very advanced and it learns all kinds of relationships, patterns and so on. But it doesn't have a real model and thinking process relating to the the actual problem. It thinks about what text could describe a solution that is linguistically and language semantically probable. Since human language embedds so many of the logics and ground truths that's good enough to result in a textual description that approximate or nails the actual underlying problem. And this is why we see them being able to solve quite advanced problems. I admit that people are wondering now, what's different about human thinking? Maybe we do the same, you invent a probable sounding answer and then check if it was correct, rinse and repeat until you find one that works. But this in itself is a big conjecture. We don't really know how human thinking works. We've found a method that works well for computers and now we wonder if maybe we're just the same but scaled even higher or with slight modifications. I've heard from ML experts though that they don't think so. Most seem to believe different architecture will be needed, world models, model ensembles with various specialized models with different architecture working together, etc. That LLMs fundamentaly are kind of limited by their nature as next token predictors. |