Remix.run Logo
energy123 a day ago

  > Finally, we've introduced thinking summaries for Claude 4 models that use a smaller model to condense lengthy thought processes. This summarization is only needed about 5% of the time—most thought processes are short enough to display in full.
This is not better for the user. No users want this. If you're doing this to prevent competitors training on your thought traces then fine. But if you really believe this is what users want, you need to reconsider.
comova a day ago | parent | next [-]

I believe this is to improve performance by shortening the context window for long thinking processes. I don't think this is referring to real-time summarizing for the users' sake.

usaar333 a day ago | parent | next [-]

When you do a chat are reasoning traces for prior model outputs in the LLM context?

int_19h a day ago | parent [-]

No, they are normally stripped out.

j_maffe a day ago | parent | prev [-]

> I don't think this is referring to real-time summarizing for the users' sake.

That's exactly what it's referring to.

dr_kiszonka a day ago | parent | prev | next [-]

I agree. Thinking traces are the first thing I check when I suspect Claude lied to me. Call me cynical, but I suspect that these new summaries will conveniently remove the "evidence."

throwaway314155 a day ago | parent | prev | next [-]

If _you_ really believe this is what all users want, _you_ should reconsider. Your describing a feature for power users. It should be a toggle but it's silly to say it didn't improve UX for people who don't want to read strange babbling chains of thought.

energy123 a day ago | parent | next [-]

You're accusing me of mind reading other users, but then proceed to engage in your own mind reading of those same users.

Have a look in Gemini related subreddits after they nerfed their CoT yesterday. There's nobody happy about this trend. A high quality CoT that gets put through a small LLM is really no better than noise. Paternalistic noise. It's not worth reading. Just don't even show me the CoT at all.

If someone is paying for Opus 4 then they likely are a power user, anyway. They're doing it for the frontier performance and I would assume such users would appreciate the real CoT.

porridgeraisin a day ago | parent | prev [-]

Here's an example non-power-user usecase for CoT:

Sometimes when I miss to specify a detail in my prompt and it's just a short task where I don't bother with long processes like "ask clarifying questions, make a plan and then follow it" etc etc, I see it talking about making that assumption in the CoT and I immediately cancel the request and edit the detail in.

a day ago | parent | prev [-]
[deleted]