I'm pretty much a layperson in this field, but I don't understand why we're trying to teach a stochastic text transformer to reason. Why would anyone expect that approach to work?

I would have thought the more obvious approach would be to couple it to some kind of symbolic logic engine. It might transform plain language statements into fragments conforming to a syntax which that engine could then parse deterministically. This is the Platonic ideal of reasoning that the author of the post pooh-poohs, I guess, but it seems to me to be the whole point of reasoning; reasoning is the application of logic in evaluating a proposition. The LLM might be trained to generate elements of the proposition, but it's too random to apply logic.

▲

_diyar 6 days ago | parent | next [-]

We expect this approach to work because it's currently the best working approach. Nothing else comes close.

Using symbolic language is a good idea in theory, but in practice it doesn't scale as well as auto-regression + RL.

The IMO results of DeepMind illustrate this well: In 2024, they solved it using AlphaProof and AlphaGeometry, using the Lean language as a formal symbolic logic[1]. In 2025 they performed better and faster by just using a fancy version of Gemini, only using natural language[2].

[1] https://deepmind.google/discover/blog/ai-solves-imo-problems...

[2] https://deepmind.google/discover/blog/advanced-version-of-ge...

Note: I agree with the notion of the parent comment that letting the models reason in latent space might make sense, but that's where I'm out of my depth.

	▲	xg15 5 days ago \| parent \| next [-]
		The example is a bit of an unfair comparison though. It takes questions already posed in natural language and, as far as I can tell, expects results in natural language, too. This means that whatever system is evaluated in this challenge necessarily has to deal with natural language. And indeed, a big part of the AlphaProof system was a neural network to convert from natural language to Lean. None of this has anything to do with reasoning ability. I think it would be interesting to present an inverse challenge, where the problems are already posed in a formal language. Would a network that first converts them into natural language, then does chain-of-thought on that, then translates the result back into formal language still be better than a simple symbolic reasoner that could operate on the formal language directly?
	▲	safety1st 6 days ago \| parent \| prev [-]
		Very interesting stuff, thanks!

▲

gmadsen 6 days ago | parent | prev | next [-]

because what can be embedded in billions of parameters is highly unintuitive to common sense and an active area of research. We do it because it works.

One other point, the platonic ideal of reasoning is not even an approximation for human reason. The idea that you take away emotion and you end up with Spock is a fantasy. All neuroscience and psychology research point to the necessary and strong coupling of actions/thoughts with emotions. you don't have a functional system with just logical deduction. At a very basic level it is not functional

▲

bubblyworld 6 days ago | parent | prev | next [-]

I think that focusing on systems of truth (like formal logics) might be missing the forest for the trees a bit. There are lots of other things we use reasoning for, like decision making and navigating uncertainty, that are arguably just as valuable as establishing truthiness. Mathematicians are very careful to use words like "implication" and "satisfaction" (as opposed to words like "reasoning") to describe their logics, because the philosophers may otherwise lay siege to their department.

A model that is mathematically incorrect (i.e. has some shaky assumptions and inference issues) but nevertheless makes good decisions (like "which part of this codebase do I need to change?") would still be very valuable, no? I think this is part of the value proposition of tools like Claude Code or Codex. Of course, current agentic tools seem to struggle with both unless you provide a lot of guidance, but a man can dream =P

▲

Night_Thastus 6 days ago | parent | prev | next [-]

Congratulations, you've said the quiet part out loud.

Yes, the idea is fundamentally flawed. But there's so much hype and so many dollars to be made selling such services, everyone is either genuinely fooled or sticking their fingers in their ears and pretending not to notice.

	▲	safety1st 5 days ago \| parent [-]
		Flawed or not I found a lot of the counterpoints people raised totally fascinating. The most significant point probably being _diyar's, that we've tried multiple approaches and LLMs are currently doing it better than the others. But the more philosophical stuff is fun too like the notion that logic and emotion are not actually separable within human cognitive processes, so why would you assume logic will end up off in its own little box in an artificial intelligence. I mean what a fascinating field of inquiry

▲

dcre 6 days ago | parent | prev | next [-]

It is sort of amazing that it works, and no one knows why, but empirically speaking it is undeniable that it does work. The IMO result was achieved without any tool calls to a formal proof system. I agree that is a much more plausible-sounding approach.

https://arstechnica.com/ai/2025/07/google-deepmind-earns-gol...

▲

HarHarVeryFunny 6 days ago | parent [-]

Surely we do know why - reinforcement learning for reasoning. These systems are trained to generate reasoning steps that led to verified correct conclusions during training. No guarantees how they'll perform on different problems of course, but in relatively narrow closed domains like math and programming, it doesn't seem surprising that when done at scale there are similar enough problems where similar reasoning logic will apply, and it will be successful.

	▲	dcre 2 days ago \| parent [-]
		We don't know why that is sufficient to enable the models to develop the capability, and we don't know what they are actually doing under the hood when they employ the capability.

▲

HarHarVeryFunny 6 days ago | parent | prev | next [-]

It can work when:

a) The "reasoning" is regurgitated (in LLM sense) from the training set rather than novel, OR

b) As a slight variation of above, the model has been RL-trained for reasoning such that it's potential outputs are narrowed and biased towards generating reasoning steps that worked (i.e. led to verified correct conclusions) on reasoning samples it was trained on. In domains like math where similar sequences of reasoning steps can be applied to similar problems, this works well.

I don't think most people expect LLMs to be good at reasoning in the general case - it's more a matter of "if the only tool you have is a hammer, then every problem is a nail". Today's best general-purpose AI (if not AGI) is LLMs, so people try to use LLMs for reasoning - try to find ways of squeezing all the reasoning juice out of the training data using an LLM as the juicer.

▲

shannifin 6 days ago | parent | prev | next [-]

Problem is, even with symbolic logic, reasoning is not completely deterministic. Whether one can get to a set of given axioms from a given proposition is sometimes undecidable.

▲

bubblyworld 6 days ago | parent [-]

I don't think this is really a problem. The general problem of finding a proof from some axioms to some formula is undecidable (in e.g. first order logic). But that doesn't tell you anything about specific cases, in the same way that we can easily tell whether some specific program halts, like this one:

"return 1"

▲

shannifin 6 days ago | parent [-]

True, I was rather pointing out that being able to parse symbolic language deterministically doesn't imply that we could then "reason" deterministically in general; the reasoning would still need to involve some level of stochasticism. Whether or not that's a problem in practice depends on specifics.

	▲	6 days ago \| parent [-]
		[deleted]

▲

wonnage 6 days ago | parent | prev | next [-]

My impression of LLM “reasoning” is that works more like guardrails. Perhaps the space of possible responses to the initial prompt is huge and doesn’t exactly match any learned information. All the text generated during reasoning is high strength. So placing it in the context should hopefully guide answer generation towards something reasonable.

It’s the same idea as manually listing a bunch of possibly-useful facts in the prompt, but the LLM is able to generate plausible sounding text itself.

I feel like this relates to why LLM answers tend to be verbose too, it needs to put the words out there in order to stay coherent.

▲

horizion2025 5 days ago | parent | prev [-]

I think you should drop the "stochastic text transformer" label you have probably heard applied, and instead think of them as neural networks that they are. Reason being that the term says absolutely zero about capabilities but creates a subjective 'reduction'. It's just a thought terminating cliché.

Let's for the sake of argument assume current LLM's are a mirage but in the future some new technology emerges that offers true intelligence and true reasoning. At the end of the day such a system will also input text and output text, and output will probably piece-meal as current LLM's (and humans) do. So voila: They are also "stochastic text transformers".

Yes LLM's were trained to predict next token. But clearly they are not just a small statistical table or whatever. Rather, it turns out that to be good at predicting the next token, after some point you need a lot of extra capabilities, so that's why they emerge during training. All the "next-token-prediction" is just a way abstract and erasing name of what is going on. A child learning how to write, fill in math lessons etc. is also learning 'next token prediction' from this vantage point. It says nothing about what goes on inside the brain of the child, or indeed inside the LLM. It is a confusion between interface and implementation. Behind the interface getNextToken(String prefix) may either be hiding a simple table or a 700 billion-size neural network or a 100 billion sized neuron human brain.