Remix.run Logo
a2128 a day ago

> Finally, we've introduced thinking summaries for Claude 4 models that use a smaller model to condense lengthy thought processes. This summarization is only needed about 5% of the time—most thought processes are short enough to display in full. Users requiring raw chains of thought for advanced prompt engineering can contact sales about our new Developer Mode to retain full access.

I don't want to see a "summary" of the model's reasoning! If I want to make sure the model's reasoning is accurate and that I can trust its output, I need to see the actual reasoning. It greatly annoys me that OpenAI and now Anthropic are moving towards a system of hiding the models thinking process, charging users for tokens they cannot see, and providing "summaries" that make it impossible to tell what's actually going on.

layoric a day ago | parent | next [-]

There are several papers pointing towards 'thinking' output is meaningless to the final output, and using dots, or pause tokens enabling the same additional rounds of throughput result in similar improvements.

So in a lot of regards the 'thinking' is mostly marketing.

- "Think before you speak: Training Language Models With Pause Tokens" - https://arxiv.org/abs/2310.02226

- "Let's Think Dot by Dot: Hidden Computation in Transformer Language Models" - https://arxiv.org/abs/2404.15758

- "Do LLMs Really Think Step-by-step In Implicit Reasoning?" - https://arxiv.org/abs/2411.15862

- Video by bycloud as an overview -> https://www.youtube.com/watch?v=Dk36u4NGeSU

Davidzheng a day ago | parent | next [-]

Lots of papers are insane. You can test it on competition math problems with s local AI and replace its thinking process with dots and see the result yourself.

WA 18 hours ago | parent [-]

So what's the result? I don't have a local LLM. Are the "dot papers" insane or the "thinking in actual reasoning tokens" insane?

cayley_graph a day ago | parent | prev [-]

Wow, my first ever video on AI! I'm rather disappointed. That was devoid of meaningful content save for the two minutes where they went over the Anthropic blog post on how LLMs (don't) do addition. Importantly, they didn't remotely approach what those other papers are about, or why thinking tokens aren't important for chain-of-thought. Is all AI content this kind of slop? Sorry, no offense to the above comment, it was just a total waste of 10 minutes that I'm not used to.

So, to anyone more knowledgeable than the proprietor of that channel: can you outline why it's possible to replace thinking tokens with garbage without a decline in output quality?

edit: Section J of the first paper seems to offer some succint explanations.

layoric 21 hours ago | parent [-]

The video is just an entertaining overview, as indicated, I'm not the author of the video, it wasn't meant to be a deep dive. I linked the three related papers directly in there. I don't know how much more you are expecting from a HN comment, but this was a point in the right direction, not the definitive guide on the matter. This is a you problem.

cayley_graph 20 hours ago | parent [-]

An overview of what? It's entertaining to me when I come away understanding something more than I did before. I expected a high level explanation of the papers, or the faintest intuition behind the phenomenon your comment talked about.

If you watched the video, it doesn't actually say anything besides restating variants of "thinking tokens aren't important" in a few different ways, summarizing a distantly related blog post, and entertaining some wild hypotheses about the future of LLMs. It's unclear if the producer has any deeper understanding of the subject; it honestly sounded like some low grade LLM generated fluff. I'm simply not used to that level of lack-of-substance. It wasn't a personal attack against you, as indicated.

kovezd a day ago | parent | prev | next [-]

Don't be so concerned. There's ample evidence that thinking is often disassociated from the output.

My take is that this is a user experience improvement, given how little people actually goes on to read the thinking process.

padolsey a day ago | parent | next [-]

If we're paying for reasoning tokens, we should be able to have access to these, no? Seems reasonable enough to allow access, and then we can perhaps use our own streaming summarization models instead of relying on these very generic-sounding ones they're pushing.

user_7832 19 hours ago | parent | prev | next [-]

> There's ample evidence that thinking is often disassociated from the output.

What kind of work do use LLMs for? For the semi technical “find flaws in my argument” thing, I find it generally better at not making common or expected fallacies or assumptions.

Davidzheng a day ago | parent | prev [-]

then provide it as an option?

skerit 13 hours ago | parent | prev | next [-]

Are they referring to their own chat interface? Because the API still streams the thinking tokens immediately.

khimaros a day ago | parent | prev [-]

i believe Gemini 2.5 Pro also does this

izabera a day ago | parent | next [-]

I am now focusing on checking your proposition. I am now fully immersed in understanding your suggestion. I am now diving deep into whether Gemini 2.5 pro also does this. I am now focusing on checking the prerequisites.

patates 21 hours ago | parent | prev [-]

It does now, but I think it wasn't the case before?