I often wonder how people can look at a log like this and still confidently state that this isn't reasoning.

It (thinking steps) has moments of brilliance, and generally convincing looking steps and improved outputs. Wether that is reasoning seems to be a matter of interpretation.

From skimming the log > After popping the 2, the stack is [X11, then pushing X2 would make it [X2, X1]? No, because pushing adds to the top. So after popping 2, the stack is [X1],then pushing X2 adds it on top → [X2, X1].

Wait, no, when you push, you add to the top. So after popping the 2, the stack is [X1], then pushing X2 would make it [X2, X1]? No, wait, the stack is LIFO. So pushing X2 would put it on top of X1 → stack becomes [X2, X1]? No, no. Wait, after popping the 2, the stack is [X1]. Then pushing X2 would make the stack [X2, X1]? No, no. Wait, when you push, the new element is added to the top. So after popping the 2 (so stack is [X1]), then pushing X2 gives [X2, X1]? No, no. Wait, the stack was [X1], then pushing X2 would make it [X2] on top of X1 → so stack is [X2, X1]? Yes, exactly.

▲

garciasn 6 hours ago | parent | prev [-]

Depends on the definition of reasoning:

1) think, understand, and form judgments by a process of logic.

—- LLMs do not think, nor do they understand; they also cannot form ‘judgments’ in any human-relatable way. They’re just providing results in the most statistically relevant way their training data permits.

2) find an answer to a problem by considering various possible solutions

—- LLMs can provide a result that may be an answer after providing various results that must be verified as accurate by a human, but they don’t do this in any human-relatable way either.

—-

So; while LLMs continue to be amazing mimics, thus they APPEAR to be great at ‘reasoning’, they aren’t doing anything of the sort, today.

▲

CamperBob2 5 hours ago | parent [-]

Exposure to our language is sufficient to teach the model how to form human-relatable judgements. The ability to execute tool calls and evaluate the results takes care of the rest. It's reasoning.

▲

garciasn 5 hours ago | parent [-]

SELECT next_word, likelihood_stat FROM context ORDER BY 2 DESC LIMIT 1

is not reasoning; it just appears that way due to Clarke’s third law.

▲

int_19h 4 hours ago | parent | next [-]

Sure, at the end of the day it selects the most probable token - but it has to compute the token probabilities first, and that's the part where it's hard to see how it could possibly produce a meaningful log like this without some form of reasoning (and a world model to base that reasoning on).

So, no, this doesn't actually answer the question in a meaningful way.

	▲	4 hours ago \| parent [-]
		[deleted]

▲

CamperBob2 5 hours ago | parent | prev [-]

(Shrug) You've already had to move your goalposts to the far corner of the parking garage down the street from the stadium. Argument from ignorance won't help.