Remix.run Logo
LudwigNagasena 6 days ago

> The first is that reasoning probably requires language use. Even if you don’t think AI models can “really” reason - more on that later - even simulated reasoning has to be reasoning in human language.

That is an unreasonable assumption. In case of LLMs it seems wasteful to transform a point from latent space into a random token and lose information. In fact, I think in near future it will be the norm for MLLMs to "think" and "reason" without outputting a single "word".

> Whether AI reasoning is “real” reasoning or just a mirage can be an interesting question, but it is primarily a philosophical question. It depends on having a clear definition of what “real” reasoning is, exactly.

It is not a "philosophical" (by which the author probably meant "practically inconsequential") question. If the whole reasoning business is just rationalization of pre-computed answers or simply a means to do some computations because every token provides only a fixed amount of computation to update the model's state, then it doesn't make much sense to focus on improving the quality of chain-of-thought output from human POV.

safety1st 6 days ago | parent | next [-]

I'm pretty much a layperson in this field, but I don't understand why we're trying to teach a stochastic text transformer to reason. Why would anyone expect that approach to work?

I would have thought the more obvious approach would be to couple it to some kind of symbolic logic engine. It might transform plain language statements into fragments conforming to a syntax which that engine could then parse deterministically. This is the Platonic ideal of reasoning that the author of the post pooh-poohs, I guess, but it seems to me to be the whole point of reasoning; reasoning is the application of logic in evaluating a proposition. The LLM might be trained to generate elements of the proposition, but it's too random to apply logic.

_diyar 6 days ago | parent | next [-]

We expect this approach to work because it's currently the best working approach. Nothing else comes close.

Using symbolic language is a good idea in theory, but in practice it doesn't scale as well as auto-regression + RL.

The IMO results of DeepMind illustrate this well: In 2024, they solved it using AlphaProof and AlphaGeometry, using the Lean language as a formal symbolic logic[1]. In 2025 they performed better and faster by just using a fancy version of Gemini, only using natural language[2].

[1] https://deepmind.google/discover/blog/ai-solves-imo-problems...

[2] https://deepmind.google/discover/blog/advanced-version-of-ge...

Note: I agree with the notion of the parent comment that letting the models reason in latent space might make sense, but that's where I'm out of my depth.

xg15 5 days ago | parent | next [-]

The example is a bit of an unfair comparison though. It takes questions already posed in natural language and, as far as I can tell, expects results in natural language, too.

This means that whatever system is evaluated in this challenge necessarily has to deal with natural language. And indeed, a big part of the AlphaProof system was a neural network to convert from natural language to Lean.

None of this has anything to do with reasoning ability.

I think it would be interesting to present an inverse challenge, where the problems are already posed in a formal language. Would a network that first converts them into natural language, then does chain-of-thought on that, then translates the result back into formal language still be better than a simple symbolic reasoner that could operate on the formal language directly?

safety1st 6 days ago | parent | prev [-]

Very interesting stuff, thanks!

gmadsen 6 days ago | parent | prev | next [-]

because what can be embedded in billions of parameters is highly unintuitive to common sense and an active area of research. We do it because it works.

One other point, the platonic ideal of reasoning is not even an approximation for human reason. The idea that you take away emotion and you end up with Spock is a fantasy. All neuroscience and psychology research point to the necessary and strong coupling of actions/thoughts with emotions. you don't have a functional system with just logical deduction. At a very basic level it is not functional

bubblyworld 6 days ago | parent | prev | next [-]

I think that focusing on systems of truth (like formal logics) might be missing the forest for the trees a bit. There are lots of other things we use reasoning for, like decision making and navigating uncertainty, that are arguably just as valuable as establishing truthiness. Mathematicians are very careful to use words like "implication" and "satisfaction" (as opposed to words like "reasoning") to describe their logics, because the philosophers may otherwise lay siege to their department.

A model that is mathematically incorrect (i.e. has some shaky assumptions and inference issues) but nevertheless makes good decisions (like "which part of this codebase do I need to change?") would still be very valuable, no? I think this is part of the value proposition of tools like Claude Code or Codex. Of course, current agentic tools seem to struggle with both unless you provide a lot of guidance, but a man can dream =P

Night_Thastus 6 days ago | parent | prev | next [-]

Congratulations, you've said the quiet part out loud.

Yes, the idea is fundamentally flawed. But there's so much hype and so many dollars to be made selling such services, everyone is either genuinely fooled or sticking their fingers in their ears and pretending not to notice.

safety1st 5 days ago | parent [-]

Flawed or not I found a lot of the counterpoints people raised totally fascinating. The most significant point probably being _diyar's, that we've tried multiple approaches and LLMs are currently doing it better than the others. But the more philosophical stuff is fun too like the notion that logic and emotion are not actually separable within human cognitive processes, so why would you assume logic will end up off in its own little box in an artificial intelligence. I mean what a fascinating field of inquiry

dcre 6 days ago | parent | prev | next [-]

It is sort of amazing that it works, and no one knows why, but empirically speaking it is undeniable that it does work. The IMO result was achieved without any tool calls to a formal proof system. I agree that is a much more plausible-sounding approach.

https://arstechnica.com/ai/2025/07/google-deepmind-earns-gol...

HarHarVeryFunny 6 days ago | parent [-]

Surely we do know why - reinforcement learning for reasoning. These systems are trained to generate reasoning steps that led to verified correct conclusions during training. No guarantees how they'll perform on different problems of course, but in relatively narrow closed domains like math and programming, it doesn't seem surprising that when done at scale there are similar enough problems where similar reasoning logic will apply, and it will be successful.

dcre 2 days ago | parent [-]

We don't know why that is sufficient to enable the models to develop the capability, and we don't know what they are actually doing under the hood when they employ the capability.

HarHarVeryFunny 6 days ago | parent | prev | next [-]

It can work when:

a) The "reasoning" is regurgitated (in LLM sense) from the training set rather than novel, OR

b) As a slight variation of above, the model has been RL-trained for reasoning such that it's potential outputs are narrowed and biased towards generating reasoning steps that worked (i.e. led to verified correct conclusions) on reasoning samples it was trained on. In domains like math where similar sequences of reasoning steps can be applied to similar problems, this works well.

I don't think most people expect LLMs to be good at reasoning in the general case - it's more a matter of "if the only tool you have is a hammer, then every problem is a nail". Today's best general-purpose AI (if not AGI) is LLMs, so people try to use LLMs for reasoning - try to find ways of squeezing all the reasoning juice out of the training data using an LLM as the juicer.

shannifin 6 days ago | parent | prev | next [-]

Problem is, even with symbolic logic, reasoning is not completely deterministic. Whether one can get to a set of given axioms from a given proposition is sometimes undecidable.

bubblyworld 6 days ago | parent [-]

I don't think this is really a problem. The general problem of finding a proof from some axioms to some formula is undecidable (in e.g. first order logic). But that doesn't tell you anything about specific cases, in the same way that we can easily tell whether some specific program halts, like this one:

"return 1"

shannifin 6 days ago | parent [-]

True, I was rather pointing out that being able to parse symbolic language deterministically doesn't imply that we could then "reason" deterministically in general; the reasoning would still need to involve some level of stochasticism. Whether or not that's a problem in practice depends on specifics.

6 days ago | parent [-]
[deleted]
wonnage 6 days ago | parent | prev | next [-]

My impression of LLM “reasoning” is that works more like guardrails. Perhaps the space of possible responses to the initial prompt is huge and doesn’t exactly match any learned information. All the text generated during reasoning is high strength. So placing it in the context should hopefully guide answer generation towards something reasonable.

It’s the same idea as manually listing a bunch of possibly-useful facts in the prompt, but the LLM is able to generate plausible sounding text itself.

I feel like this relates to why LLM answers tend to be verbose too, it needs to put the words out there in order to stay coherent.

horizion2025 5 days ago | parent | prev [-]

I think you should drop the "stochastic text transformer" label you have probably heard applied, and instead think of them as neural networks that they are. Reason being that the term says absolutely zero about capabilities but creates a subjective 'reduction'. It's just a thought terminating cliché.

Let's for the sake of argument assume current LLM's are a mirage but in the future some new technology emerges that offers true intelligence and true reasoning. At the end of the day such a system will also input text and output text, and output will probably piece-meal as current LLM's (and humans) do. So voila: They are also "stochastic text transformers".

Yes LLM's were trained to predict next token. But clearly they are not just a small statistical table or whatever. Rather, it turns out that to be good at predicting the next token, after some point you need a lot of extra capabilities, so that's why they emerge during training. All the "next-token-prediction" is just a way abstract and erasing name of what is going on. A child learning how to write, fill in math lessons etc. is also learning 'next token prediction' from this vantage point. It says nothing about what goes on inside the brain of the child, or indeed inside the LLM. It is a confusion between interface and implementation. Behind the interface getNextToken(String prefix) may either be hiding a simple table or a 700 billion-size neural network or a 100 billion sized neuron human brain.

kazinator 6 days ago | parent | prev | next [-]

Not all reasoning requires language. Symbolic reasoning uses language.

Real-time spatial reasoning like driving a car and not hitting things does not seem linguistic.

Figuring out how to rotate a cabinet so that it will clear through a stairwell also doesn't seem like it requires language, only to communicate the solution to someone else (where language can turn into a hindrance, compared to a diagram or model).

llllm 6 days ago | parent [-]

Pivot!

kazinator 6 days ago | parent [-]

Can we be Friends?

vmg12 6 days ago | parent | prev | next [-]

Solutions to some of the hardest problems I've had have only come after a night of sleep or when I'm out on a walk and I'm not even thinking about the problem. Maybe what my brain was doing was something different from reasoning?

andybak 6 days ago | parent | next [-]

This is a very important point and mostly absent from the conversation.

We have many words that almost mean the same thing or can mean ment different things - and conversations about intelligence and consciousness are riddled with them.

tempodox 6 days ago | parent [-]

> This is a very important point and mostly absent from the conversation.

That's because when humans are mentioned at all in the context of coding with “AI”, it's mostly as bad and buggy simulations of those perfect machines.

jojobas 6 days ago | parent | prev [-]

At the very least intermediate points of one's reasoning are grounded in reality.

xg15 5 days ago | parent | prev | next [-]

> The first is that reasoning probably requires language use. Even if you don’t think AI models can “really” reason - more on that later - even simulated reasoning has to be reasoning in human language.

I'd claim that this assumption doesn't even hold true for humans. Reasoning in language is the most "flashy" kind of reasoning and the one that can be most readily shared with other people - because we can articulate it, write it down, publish, etc.

But I know for sure that I'm not constantly narrating my life in my head, like the reasoning traces of LLMs.

A lot of reasoning happens visually, I.e. by imagining some scene and thinking how it would play out. In other situations, it's spontaneous ideas that "just pop up" - I.e., there are unconscious processes and probably some kind of association involved.

None of that uses language.

limaoscarjuliet 6 days ago | parent | prev | next [-]

> In fact, I think in near future it will be the norm for MLLMs to "think" and "reason" without outputting a single "word".

It will be outputting something, as this is the only way it can get more compute - output a token, then all context + the next token is fed through the LLM again. It might not be presented to the user, but that's a different story.

LudwigNagasena 5 days ago | parent [-]

That’s the only effective way to get more compute in current production LLMs, but the field is evolving.

pornel 6 days ago | parent | prev | next [-]

You're looking at this from the perspective of what would make sense for the model to produce. Unfortunately, what really dictates the design of the models is what we can train the models with (efficiently, at scale). The output is then roughly just the reverse of the training. We don't even want AI to be an "autocomplete", but we've got tons of text, and a relatively efficient method of training on all prefixes of a sentence at the same time.

There have been experiments with preserving embedding vectors of the tokens exactly without loss caused by round-tripping through text, but the results were "meh", presumably because it wasn't the input format the model was trained on.

It's conceivable that models trained on some vector "neuralese" that is completely separate from text would work better, but it's a catch 22 for training: the internal representations don't exist in a useful sense until the model is trained, so we don't have anything to feed into the models to make them use them. The internal representations also don't stay stable when the model is trained further.

LudwigNagasena 5 days ago | parent [-]

It’s indeed a very tricky problem with no clear solution yet. But if someone finds a way to bootstrap it, it may be a new qualitative jump that may reverse the current trend of innovating ways to cut inference costs rather than improve models.

kromem 6 days ago | parent | prev | next [-]

Latent space reasoners are a thing, and honestly we're probably already seeing emergent latent space reasoners starting to end up embedded into the weights as new models train on extensive reasoning synthetics.

If Othello-GPT can build a board in latent space given just the moves, can an exponentially larger transformer build a reasoner in their latent space given a significant number of traces?

potsandpans 6 days ago | parent | prev | next [-]

> It is not a "philosophical" (by which the author probably meant "practically inconsequential") question.

I didn't take it that way. I suppose it depends on whether or not you believe philosophy is legitimate

LudwigNagasena 6 days ago | parent | next [-]

The author called it “the least interesting question possible” and contraposed it with such questions as “how accurately it reflects the actual process going on.” I don’t see any other way to take it.

Terr_ 6 days ago | parent | prev [-]

> I suppose it depends on whether or not you believe philosophy is legitimate

The only way to declare philosophy illegitimate is to be using legitimate philosophy, so... :p

HarHarVeryFunny 6 days ago | parent | prev | next [-]

I don't think that the concept of "real reasoning" vs simulated or fake reasoning makes any sense... LLM reasoning can be regarded as a subset of human reasoning, and a more useful comparison would be not real vs fake but rather what is missing from LLM reasoning that would need to be added (likely in a completely new architecture - not an LLM/transformer) to make it more human-like and capable.

Human reasoning, and cortical function in general, would also appear to be prediction based, but there are many differences to LLMs, starting with the fact that we learn continuously and incrementally from our own experience and prediction failures and successes. Human reasoning is basically chained what-if prediction, based on predictive outcomes of individual steps that we have learnt, either in terms of general knowledge or domain-specific problem solving steps that we have learnt.

Perhaps there is not so much difference between what a human does and an LLM does in, say, tackling a math problem when the RL-trained reasoning-LLM chains together a sequence of reasoning steps that worked before...

Where the difference come in, is in how the LLM learned those steps in the first place, and what happens when its reasoning fails. In humans these are essentially the same thing - we learn by predicting and giving it a go, and learn from prediction failure (sensory/etc feedback) to update our context-specific predictions for next time. If we reach a reasoning/predictive impasse - we've tried everything that comes to mind and everything fails, then our innate traits of curiosity and boredom (maybe more?) come to play and we will explore the problem and learn and try again. Curiosity and exploration can of course lead to gain of knowledge from things like imitation and active pursuit (or receipt) of knowledge from sources other then personal experimentation.

The LLM of course has no ability to learn (outside of in-context learning - a poor substitute), so is essentially limited in capability to what it has been pre-trained on, and pre-training is never going to be the solution to a world full of infinite ever-changing variety.

So, rather than say that an LLM isn't doing "real" reasoning, it seems more productive to acknowledge that prediction is the basis of reasoning, but that the LLM (or rather a future cognitive architecture - not a pass-thru stack of transformer layers!) needs many additional capabilities such as continual/incremental learning, innate traits such as curiosity to expose itself to learning situations, and other necessary cognitive apparatus such as working memory, cognitive iteration/looping (cf thalamo-cortical loop), etc.

esafak 6 days ago | parent | prev | next [-]

It is not obvious that a continuous space is better for thinking than a discrete one.

AbrahamParangi 6 days ago | parent | prev | next [-]

You're not allowed to say that it's not reasoning without distinguishing what is reasoning. Absent a strict definition that the models fail and that some other reasoner passes, it is entirely philosophical.

LudwigNagasena 6 days ago | parent | next [-]

I think it’s perfectly fine to discuss whether it’s reasoning without fully committing to any foundational theory of reasoning. There are practical things we expect from reasoning that we can operationalise.

If it’s truly reasoning, then it wouldn’t be able to deceive or to rationalize a leaked answer in a backwards fashion. Asking and answering those questions can help us understand how the research agendas for improving reasoning and improving alignment should be modified.

sdenton4 6 days ago | parent | prev [-]

"entirely philosophical"

I don't think this means what you think it means... Philosophers (at least up to Wittgenstein) love constructing and arguing about definitions.

emorning4 6 days ago | parent | prev [-]

[dead]