Remix.run Logo
ian_j_butler a day ago

It's well-known that the reasoning model output is not necessarily faithful to the content of the thinking scratch pad anyway, even if you had it unsummarized and available verbatim.

Setting aside coding agents.. we really need this information to even pretend to evaluate the claims of stuff like mathematical breakthroughs, which is exactly why we will never see it. Very embarrassing to get the right answer for the wrong reason. But to give the models some credit, you could argue that even paying too much attention to the thinking is misunderstanding how CoT works. The argument would be that thinking in LLMs isn't really thinking, that it's self-reinforcement and circling to to encourage stability around beneficial attractors instead of degenerate ones. Can't have it both ways though: either the thinking is thinking and so it should be correct. Or the thinking is NOT thinking, and it's NOT real justification for the outcome, and these systems are even more hopelessly opaque than we usually assume.

handoflixue 16 hours ago | parent | next [-]

> we really need this information to even pretend to evaluate the claims of stuff like mathematical breakthroughs

Why?

Either the proof is correct, or it isn't, right?

And it either produces them reliably or not, right?

Like, even if it's reasoning is completely wrong, and it's only producing correct answers 10% of the time, that's still an astounding amount above baseline and a useful tool.

Humans have inaccurate thinking all the time, and are also pretty hopelessly opaque. "It came to me in a dream" is a major plot point in the history of math. I'd still trust Ramanujan more than most mathematicians, since he got the right answer.

anuramat a day ago | parent | prev [-]

> NOT real justification

I thought it was widely accepted that it's not; eg https://www.anthropic.com/research/natural-language-autoenco...

ian_j_butler 21 hours ago | parent [-]

Right, I don't think researchers are confused on this point.. the anthropic piece is good outreach / science comms. OTOH this thread has like 200 comments and no mention of faithful/faithless reasoning. The idea that "of course the models can reason and here is the proof/artifact" is probably closer to the general understanding. That's kinda the whole setup for TFA and all the rest of the thread.

But the nuance under discussion here is exactly the kind of stuff you people take for granted in the AGI or reasoning threads. If it's practically relevant for tools/workflows with claude code, it's a good angle, maybe people are more willing to pay more attention to the details.