Remix.run Logo
dgb23 4 hours ago

Didn't you notice that the stream is not coherent or noisy? Sometimes it goes from thought A to thought B then action C, but A was entirely unnecessary noise that had nothing to do with B and C. I also sometimes had signals in the thinking output that were red flags, or as you said it got confused, but then it didn't matter at all. Now I just never look at the thinking tokens anymore, because I got bamboozled too often.

Perhaps when you summarize it, then you might miss some of these or you're doing things differently otherwise.

gck1 4 hours ago | parent [-]

The usefulness of thinking tokens in my case might come down to the conditions I have claude working in.

I primarily use claude for Rust, with what I call a masochistic lint config. Compiler and lint errors almost always trigger extended thinking when adaptive thinking is on, and that's where these tokens become a goldmine. They reveal whether the model actually considered the right way to fix the issue. Sometimes it recognizes that ownership needs to be refactored. Sometimes it identifies that the real problem lives in a crate that's for some reason is "out of scope" even though its right there in the workspace, and then concludes with something like "the pragmatic fix is to just duplicate it here for now."

So yes, the resulting code works, and by some definition the model did the correct thing. But to me, "correct" doesn't just mean working, it means maintainable. And on that question, the thinking tokens are almost never wrong or useless. Claude gets things done, but it's extremely "lazy".

gck1 an hour ago | parent [-]

Also, for anyone using opus with claude code, they again, "broke" the thinking summaries even if you had "showThinkingSummaries": true in your settings.json [1]

You have to pass `--thinking-display summarized` flag explicitly.

[1] https://github.com/anthropics/claude-code/issues/49268