You seem possibly more knowledgeable then me on the matter.

My impression is that LLMs predict the next token based on the prior context. They do that by having learned a probability distribution from tokens -> next-token.

Then as I understand, the models are never reasoning about the problem, but always about what the next token should be given the context.

The chain of thought is just rewarding them so that the next token isn't predicting the token of the final answer directly, but instead predicting the token of the reasoning to the solution.

Since human language in the dataset contains text that describes many concepts and offers many solutions to problems. It turns out that predicting the text that describes the solution to a problem often ends up being the correct solution to the problem. That this was true was kind of a lucky accident and is where all the "intelligence" comes from.

▲

photon_lines 3 days ago | parent | next [-]

So - in the pre-training step you are right -- they are simple 'statistical' predictors but there are more steps involved in their training which turn them from simple predictors to being able to capture patterns and reason -- I tried to come up with an intuitive overview of how they do this in the write-up and I'm not sure I can give you a simple explanation here, but I would recommend you play around with Deep-Seek and other more advanced 'reasoning' or 'chain-of-reason' models and ask them to perform tasks for you: they are not simply statistically combining information together. Many times they are able to reason through and come up with extremely advanced working solutions. To me this indicates that they are not 'accidently' stumbling upon solutions based on statistics -- they actually are able to 'understand' what you are asking them to do and to produce valid results.

	▲	didibus 2 days ago \| parent [-]
		If you observe the failure modes of current models, you see that they fail in ways that align with probabilistic token prediction. I don't mean that the textual prediction is simple, it's very advanced and it learns all kinds of relationships, patterns and so on. But it doesn't have a real model and thinking process relating to the the actual problem. It thinks about what text could describe a solution that is linguistically and language semantically probable. Since human language embedds so many of the logics and ground truths that's good enough to result in a textual description that approximate or nails the actual underlying problem. And this is why we see them being able to solve quite advanced problems. I admit that people are wondering now, what's different about human thinking? Maybe we do the same, you invent a probable sounding answer and then check if it was correct, rinse and repeat until you find one that works. But this in itself is a big conjecture. We don't really know how human thinking works. We've found a method that works well for computers and now we wonder if maybe we're just the same but scaled even higher or with slight modifications. I've heard from ML experts though that they don't think so. Most seem to believe different architecture will be needed, world models, model ensembles with various specialized models with different architecture working together, etc. That LLMs fundamentaly are kind of limited by their nature as next token predictors.

▲

coderenegade 3 days ago | parent | prev [-]

I think the intuitive leap (or at least, what I believe) is that meaning is encoded in the media. A given context and input encodes a particular meaning that the model is able to map to an output, and because the output is also in the same medium (tokens, text), it also has meaning. Even reasoning can fit in with this, because the model generates additional meaningful context that allows it to better map to an output.

How you find the function that does the mapping probably doesn't matter. We use probability theory and information theory, because they're the best tools for the job, but there's nothing to say you couldn't handcraft it from scratch if you were some transcendent creature.

	▲	didibus 2 days ago \| parent [-]
		Yes exactly. The text of human natural language that it is trained on encodes the solutions to many problems as well as a lot of ground truths. The way I think of it is. First you have a random text generator. This generative "model" in theory can find the solution to all problems that text can describe. If you had a way to assert if it found the correct solution, you could run it and eventually it would generate the text that describes the working solution. Obviously inefficient and not practical. What if you made it so it skipped generating all text that aren't valid sensical English? Well now it would find the correct solution in way less iterations, but still too slow. What if it generated only text that made sense to follow the context of the question? Now you might start to see it 100-shot, 10-shot, maybe even 1-shot some problems. What if you tuned that to the max? Well you get our current crop of LLMs. What else can you do to make it better? Tune the dataset, remove text that describe wrong answers to prior context so it learns not to generate those. Add more quality answers to prior context, add more problems/solutions, etc. Instead of generating the answer to a mathematical equation the above way, generate the Python code to run to get the answer. Instead of generating the answer to questions about current real world events/facts (like the weather). Have it generate the web search query to find it. If you're asking a more complex question, instead of generating the answer directly, have it generate smaller logical steps towards the answer. Etc.