Remix.run Logo
cluckindan 4 days ago

It’s an LLM that has been trained and prompted to make users believe that the model is using logical reasoning to arrive at its output, when it is in fact still predicting the possible next output tokens, just like any other LLM.

There may be additional feedback loops, but fundamentally, that is what it is doing. Sure, it will show you what steps it takes to arrive at a conclusion, but it is just predicting the steps, the conclusion and the potential validity of the aforementioned based on its training data, not actually evaluating the logic or the truthiness of the output.

If you don’t believe me, ask your ”reasoning” LLM this question: What’s the name of the paternal great-great-grandfather of the son of Jacob’s son’s son’s son?

BrawnyBadger53 4 days ago | parent | next [-]

Or to write it less pessimistically, the models are trained to prime their own context window such that by the end of the chain they arrive at more valuable responses. By creating intermediary steps in the chain, the next step is easier to generate rather than moving directly to the desired response. We call it reasoning because it is intuitively analogous to human reasoning methods though it is understood that LLMs don't succeed as generally as humans are able to.

mvdwoord 4 days ago | parent | prev | next [-]

Progress is hard to keep track of in this fast paced environment, but aren't there already models that can add external tools and simply offload parts of he reasoning there? Maybe over MCP or some other mechanism, so it can offload e.g. calculations, or test code in a sandbox, or even write code to answer part of a question, execute the code somewhere, and take the results into the rest of the inference process as context?

Or is there a more subtle issue which prevents or makes this hard?

Is there something fundamentally impossible about having a model detecting the amount of Rs in 'strawberry' to be a string search operation and in some sandbox execute something like:

% echo "strawberry" | tr -dc "r" | wc -c

       3
It seems agents do this already, but regular GPT style environments seem to lack it?
yunohn 4 days ago | parent [-]

My observation of AI progress over the past 2yrs has shown that LLM companies are focusing purely on raw model knowledge instead of optimised usable tooling. Unsure when this will ever change, but that’s why your example is not the industry’s standard yet.

mvdwoord 4 days ago | parent [-]

My intuition, which is of course woefully inadequate in this area, says there is a ton of accuracy to be gained, and I feel also a lot of offloading and therefore pruning or better use for the rest of the parameters...

Anyway,. let me refresh my page, as I am sure while typing this some new model architecture is dropping. ;)

Varelion 4 days ago | parent | prev | next [-]

Let's break this down carefully, step by step.

Start with Jacob.

Jacob’s son → call him A.

A’s son → call him B.

B’s son → call him C.

C’s son → call him D (this is “the son of Jacob’s son’s son’s son”).

Now the question asks for the paternal great-great-grandfather of D:

D’s father → C

D’s grandfather → B

D’s great-grandfather → A

D’s great-great-grandfather → Jacob

Answer: Jacob

sindriava 4 days ago | parent | prev | next [-]

I won't read this because you're not really thinking, just pressing keyboard keys.

cluckindan 4 days ago | parent [-]

Joke’s on you, I dictated it.

sindriava 4 days ago | parent [-]

Rich coming from the guy who moved his muscles until sounds came out.

Also next time you should bother to at least copy paste your questions into any recent LLM, since they can all solve it without issue. But hallucations like this are common with non-reasoning HN users.

cluckindan 4 days ago | parent [-]

But can they solve it without referring to the Bible, or without mentioning anyone in the biblical Jacob’s family tree?

Don’t think so. Humans solve that puzzle in a very different way than LLMs ”reason” about it.

astrange 4 days ago | parent | next [-]

GPT5 and DeepThink both solved it without doing that for me, yes.

(DeepThink did wonder if it was supposed to be him afterwards or if it was a trick.)

cluckindan 4 days ago | parent [-]

Yesterday, GPT5 was producing Bible answers. I guess the developers are lurking here. :-)

Adding a second question like ”Is Abraham included in the family tree?” still makes it regress into mentioning Isaac, Judah, Joseph, 12 sons and whatnot.

nerpderp82 4 days ago | parent | prev [-]

There can be more than one intelligence. Nature has shown us that there are many. And many which can "outsmart" a human.

freejazz 4 days ago | parent | prev | next [-]

Thank you, I do not have a "reasoning" LLM and I have not found LLMs very useful in my life so I do not really engage with them outside of reading about them here and in other places.

frozenseven 4 days ago | parent | prev [-]

This isn't an explanation. Just another "AI bad!" comment.