▲ | horizion2025 5 days ago | |
I think you should drop the "stochastic text transformer" label you have probably heard applied, and instead think of them as neural networks that they are. Reason being that the term says absolutely zero about capabilities but creates a subjective 'reduction'. It's just a thought terminating cliché. Let's for the sake of argument assume current LLM's are a mirage but in the future some new technology emerges that offers true intelligence and true reasoning. At the end of the day such a system will also input text and output text, and output will probably piece-meal as current LLM's (and humans) do. So voila: They are also "stochastic text transformers". Yes LLM's were trained to predict next token. But clearly they are not just a small statistical table or whatever. Rather, it turns out that to be good at predicting the next token, after some point you need a lot of extra capabilities, so that's why they emerge during training. All the "next-token-prediction" is just a way abstract and erasing name of what is going on. A child learning how to write, fill in math lessons etc. is also learning 'next token prediction' from this vantage point. It says nothing about what goes on inside the brain of the child, or indeed inside the LLM. It is a confusion between interface and implementation. Behind the interface getNextToken(String prefix) may either be hiding a simple table or a 700 billion-size neural network or a 100 billion sized neuron human brain. |