Remix.run Logo
crazygringo 3 days ago

> But they do not actually think.

I'm so baffled when I see this being blindly asserted.

With the reasoning models, you can literally watch their thought process. You can see them pattern-match to determine a strategy to attack a problem, go through it piece-by-piece, revisit assumptions, reformulate strategy, and then consolidate findings to produce a final result.

If that's not thinking, I literally don't know what is. It's the same process I watch my own brain use to figure something out.

So I have to ask you: when you claim they don't think -- what are you basing this on? What, for you, is involved in thinking that the kind of process I've just described is missing? Because I genuinely don't know what needs to be added here for it to become "thinking".

Terr_ 3 days ago | parent | next [-]

> I'm so baffled when I see this being blindly asserted. With the reasoning models, you can literally watch their thought process.

Not true, you are falling for a very classic (prehistoric, even) human illusion known as experiencing a story:

1. There is a story-like document being extruded out of a machine humans explicitly designed for generating documents, and which humans trained on a bajillion stories humans already made.

2. When you "talk" to a chatbot, that is an iterative build of a (remote, hidden) story document, where one of the characters is adopting your text-input and the other's dialogue is being "performed" at you.

3. The "reasoning" in newer versions is just the "internal monologue" of a film noir detective character, and equally as fictional as anything that character "says out loud" to the (fictional) smokin-hot client who sashayed the (fictional) rent-overdue office bearing your (real) query on its (fictional) lips.

> If that's not thinking, I literally don't know what is.

All sorts of algorithms can achieve useful outcomes with "that made sense to me" flows, but that doesn't mean we automatically consider them to be capital-T Thinking.

> So I have to ask you: when you claim they don't think -- what are you basing this on?

Consider the following document from an unknown source, and the "chain of reasoning" and "thinking" that your human brain perceives when encountering it:

    My name is Robot Robbie.
    That high-carbon steel gear looks delicious. 
    Too much carbon is bad, but that isn't true here.
    I must ask before taking.    
    "Give me the gear, please."
    Now I have the gear.
    It would be even better with fresh manure.
    Now to find a cow, because cows make manure.
Now whose reasoning/thinking is going on? Can you point to the mind that enjoys steel and manure? Is it in the room with us right now? :P

In other words, the reasoning is illusory. Even if we accept that the unknown author is a thinking intelligence for the sake of argument... it doesn't tell you what the author's thinking.

crazygringo 3 days ago | parent [-]

You're claiming that the thinking is just a fictional story intended to look like it.

But this is false, because the thinking exhibits cause and effect and a lot of good reasoning. If you change the inputs, the thinking continues to be pretty good with the new inputs.

It's not a story, it's not fictional, it's producing genuinely reasonable conclusions around data it hasn't seen before. So how is it therefore not actual thinking?

And I have no idea what your short document example has to do with anything. It seems nonsensical and bears no resemblance to the actual, grounded chain of thought processes high-quality reasoning LLM's produce.

> OK, so that document technically has a "chain of thought" and "reasoning"... But whose?

What does it matter? If an LLM produces output, we say it's the LLM's. But I fail to see how that is significant?

czl 3 days ago | parent | next [-]

> So how is it therefore not actual thinking?

Many consider "thinking" something only animals can do, and they are uncomfortable with the idea that animals are biological machines or that life, consciousness, and thinking are fundamentally machine processes.

When an LLM generates chain-of-thought tokens, what we might casually call “thinking,” it fills its context window with a sequence of tokens that improves its ability to answer correctly.

This “thinking” process is not rigid deduction like in a symbolic rule system; it is more like an associative walk through a high-dimensional manifold shaped by training. The walk is partly stochastic (depending on temperature, sampling strategy, and similar factors) yet remarkably robust.

Even when you manually introduce logical errors into a chain-of-thought trace, the model’s overall accuracy usually remains better than if it had produced no reasoning tokens at all. Unlike a strict forward- or backward-chaining proof system, the LLM’s reasoning relies on statistical association rather than brittle rule-following. In a way, that fuzziness is its strength because it generalizes instead of collapsing under contradiction.

Terr_ 3 days ago | parent [-]

Well put, and if it doesn't notice/collapse under introduced contradictions, that's evidence it's not the kind of reasoning we were hoping for. The "real thing" is actually brittle when you do it right.

czl 3 days ago | parent [-]

Human reasoning is, in practice, much closer to statistical association than to brittle rule-following. The kind of strict, formal deduction we teach in logic courses is a special, slow mode we invoke mainly when we’re trying to check or communicate something, not the default way our minds actually operate.

Everyday reasoning is full of heuristics, analogies, and pattern matches: we jump to conclusions, then backfill justification afterward. Psychologists call this “post hoc rationalization,” and there’s plenty of evidence that people form beliefs first and then search for logical scaffolding to support them. In fact, that’s how we manage to think fluidly at all; the world is too noisy and underspecified for purely deductive inference to function outside of controlled systems.

Even mathematicians, our best examples of deliberate, formal thinkers, often work this way. Many major proofs have been discovered intuitively and later found to contain errors that didn’t actually invalidate the final result. The insight was right, even if the intermediate steps were shaky. When the details get repaired, the overall structure stands. That’s very much like an LLM producing a chain of reasoning tokens that might include small logical missteps yet still landing on the correct conclusion: the “thinking” process is not literal step-by-step deduction, but a guided traversal through a manifold of associations shaped by prior experience (or training data, in the model’s case).

So if an LLM doesn’t collapse under contradictions, that’s not necessarily a bug; it may reflect the same resilience we see in human reasoning. Our minds aren’t brittle theorem provers; they’re pattern-recognition engines that trade strict logical consistency for generalization and robustness. In that sense, the fuzziness is the strength.

Terr_ 2 days ago | parent [-]

> The kind of strict, formal deduction we teach in logic courses is a special, slow mode

Yes, but that seems like moving the goalposts.

The stricter blends of reasoning are what everybody is so desperate to evoke from LLMs, preferably along with inhuman consistency, endurance, and speed. Just imagine the repercussions if a slam-dunk paper came out tomorrow, which somehow proved the architectures and investments everyone is using for LLMs are a dead-end for that capability.

crazygringo 2 days ago | parent | next [-]

> The stricter blends of reasoning are what everybody is so desperate to evoke from LLMs

This is definitely not true for me. My prompts frequently contain instructions that aren't 100% perfectly clear, suggest what I want rather than formally specifying it, typos, mistakes, etc. The fact that the LLM usually figures out what I meant to say, like a human would, is a feature for me.

I don't want an LLM to act like an automated theorem prover. We already have those. Their strictness makes them extremely difficult to use, so their application is extremely limited.

czl 2 days ago | parent | prev [-]

I get the worry. AFAIK most of the current capex is going into scalable parallel compute, memory, and networking. That stack is pretty model agnostic, similar to how all that dot com fiber was not tied to one protocol. If transformers stall, the hardware is still useful for whatever comes next.

On reasoning, I see LLMs and classic algorithms as complements. LLMs do robust manifold following and associative inference. Traditional programs do brittle rule following with guarantees. The promising path looks like a synthesis where models use tools, call code, and drive search and planning methods such as MCTS, the way AlphaGo did. Think agentic systems that can read, write, execute, and verify.

LLMs are strongest where the problem is language. Language co evolved with cognition as a way to model the world, not just to chat. We already use languages to describe circuits, specify algorithms, and even generate other languages. That makes LLMs very handy for specification, coordination, and explanation.

LLMs can also statistically simulate algorithms, which is useful for having them think about these algorithms. But when you actually need the algorithm, it is most efficient to run the real thing in software or on purpose built hardware. Let the model write the code, compose the tools, and verify the output, rather than pretending to be a CPU.

To me the risk is not that LLMs are a dead end, but that people who do not understand them have unreasonable expectations. Real progress looks like building systems that use language to invent and implement better tools and route work to the right place. If a paper lands tomorrow that shows pure next token prediction is not enough for formal reasoning, that would be an example of misunderstanding LLMs, not a stop sign. We already saw something similar when Minsky and Papert highlighted that single layer perceptrons could not represent XOR, and the field later moved past that with multilayer networks. Hopefully we remember that and learn the right lesson this time.

rustystump 3 days ago | parent | prev [-]

The problem is that the overwhelming majority of input it has in-fact seen somewhere in the corpus it was trained on. Certainly not one for one but easily an 98% match. This is the whole point of what the other person is trying to comment on i think. The reality is most of language is regurgitating 99% to communicate an internal state in a very compressed form. That 1% tho maybe is the magic that makes us human. We create net new information unseen in the corpus.

crazygringo 3 days ago | parent | next [-]

> the overwhelming majority of input it has in-fact seen somewhere in the corpus it was trained on.

But it thinks just great on stuff it wasn't trained on.

I give it code I wrote that is not in its training data, using new concepts I've come up with in an academic paper I'm writing, and ask it to extend the code in a certain way in accordance with those concepts, and it does a great job.

This isn't regurgitation. Even if a lot of LLM usage is, the whole point is that it does fantastically with stuff that is brand new too. It's genuinely creating new, valuable stuff it's never seen before. Assembling it in ways that require thinking.

rustystump 3 days ago | parent | next [-]

I think you may think too highly of academic papers or more so that they oft still only have 1% in there.

crazygringo 3 days ago | parent [-]

I think you're missing the point. This is my own paper and these are my own new concepts. It doesn't matter if the definition of the new concepts are only 1% of the paper, the point is they are the concepts I'm asking the LLM to use, and are not in its training data.

Terr_ 3 days ago | parent [-]

How would one prove the premise that a concept is not present in the training data?

With how much data is being shoveled in there, our default assumption should be that significant components are present.

crazygringo 2 days ago | parent [-]

That would be a weird default assumption. It's not hard to come up with new ideas. In fact, it's trivial.

And if you want to know if a specific concept is known by the LLM, you can literally ask it. It generally does a great job of telling you what it is and is not familiar with.

zeroonetwothree 3 days ago | parent | prev [-]

I think it would be hard to prove that it's truly so novel that nothing similar is present in the training data. I've certainly seen in research that it's quite easy to miss related work even with extensive searching.

the_pwner224 3 days ago | parent | prev [-]

Except it's more than capable of solving novel problems that aren't in the training set and aren't a close match to anything in the training set. I've done it multiple times across multiple domains.

Creating complex Excel spreadsheet structures comes to mind, I just did that earlier today - and with plain GPT-5, not even -Thinking. Sure, maybe the Excel formulas themselves are a "98% match" to training data, but it takes real cognition (or whatever you want to call it) to figure out which ones to use and how to use them appropriately for a given situation, and how to structure the spreadsheet etc.

rustystump 3 days ago | parent [-]

I think people confuse novel to them with novel to humanity. Most of our work is not so special

the_pwner224 3 days ago | parent [-]

And what % of humans have ever thought things that are novel to humanity?

baq 3 days ago | parent | prev [-]

Brains are pretrained models, change my mind. (Not LLMs obviously, to be perfectly clear)

hamdingers 3 days ago | parent | next [-]

Brains continue learning from everything they do for as long as they're in use. Pretrained models are static after initial training.

zeroonetwothree 3 days ago | parent | prev [-]

If you are right, then I certainly cannot change your mind.

baq 2 days ago | parent [-]

Show a snake to a 1yo and explain how the kid’s reaction is not pretrained. It’s called instinct in biology, but the idea is the same.