I've been talking to friends about this extensively, and read all sorts of different social media posts on X where people deep dove things (I'm at work so I don't have any links handy - though I did submit one on HN, grain of salt, unsure how valid it is but it was interesting: https://news.ycombinator.com/item?id=47752049 ).

I think the real issue stems from the 1 Million token context window change. They did not anticipate the amount of load it would give you. That first few days after they released the new token window, I was making amazing things in one single session from nothing, to something (a new .NET based programming language inspired by Python, and a Virtual Actor framework in Rust). I think since then they've been trying too many things to tweak things, whilst irritating their users.

They even added a new "Max" thinking mode, and made "High" the old medium, which is ridiculous because you think you're using "High" but really you're not. There's a hidden config file to change their terrible defaults to let Claude be smarter still, and apparently you can toggle off the 1M tokens.

I think the real fix, and I'm surprised nobody there has done this yet, is to let the user trim down their context window.

Think about it, you used to have what? 350k tokens or so? Now Claude will keep sending your prompt from 30 minutes ago that's completely irrelevant to the back-end, whereas 3 months ago it would have been compacted by now.

Others have noted that similar prompting for some ungodly reason adds tens of thousands of extra garbage tokens (not sure why).

Edit looks like someone figured out that if you downgrade your version of Claude Code and change one single setting it unruins Claude:

https://news.ycombinator.com/item?id=47769879

▲

SkyPuncher 2 hours ago | parent | next [-]

Yea, I've realized that if I stay under 200k tokens I basically don't have usage issues any more.

A bit annoying, but not the end of the world.

	▲	consumer451 27 minutes ago \| parent [-]
		Here is the question for which I cannot find an answer, and cannot yet afford to answer myself: In Claude Code, I use Opus 4.6 1M, but stay under 250k via careful session management to avoid known NoLiMa [0] / context rot [1] crap. The question I keep wanting answered though: at ~165k tokens used, does Opus 1M actually deliver higher quality that Opus 200k? NoLiMa would indicate that 200k Opus would suck, and Opus 1M would be better... but they are the same model. However, there are practical inference deployment differences that could change the whole paradigm, right? I am so confused. Anthropic says it's the same model [2]. But, Claude Code's own source treats them as distinct variants with separate routing [3]. Closest test I found [4] asserts they're identical below 200K but it never actually A/B tests, correct? Inside Claude Code it's probably not testable, right? According to this issue [5], the CLI is non-deterministic for identical inputs, and agent sessions branch on tool-use. Would need a clean API-level test. The API level test is what I really want to know for the Claude based features in my own apps. Is there a real benchmark for this? I have reached the limits of my understanding on this problem. If what I am trying to say makes any sense, any help would be greatly appreciated. If you could help me ask the question better, that would also be appreciated. [0] https://arxiv.org/abs/2502.05167 [1] https://research.trychroma.com/context-rot [2] https://claude.com/blog/1m-context-ga [3] https://github.com/anthropics/claude-code/issues/35545 [4] https://www.claudecodecamp.com/p/claude-code-1m-context-wind... [5] https://github.com/anthropics/claude-code/issues/3370

▲

dacox 4 hours ago | parent | prev [-]

Yeah, I have been seeing lots of comments, tweets, etc, but given everything I have learned about these models - i do not think the change to 1M was innocuous. I'm not sure what they've claimed publicly, but I'm fairly certain they must be doing additional quantization, or at minimum additional quantization of the KV cache. Plus, sequence length can change things even when not fully utilized. I had to manually re-enable the "clear context and continue" feature as well.

	▲	giancarlostoro 4 hours ago \| parent [-]
		I used the heck out of it when it was announced, and it felt like I was using one of the best models I've ever used, but then so were all of their other customers, I don't think they accounted for such heavy load, or maybe follow up changes goofed something up, not sure. Like I said, the 1M token, for the first few days allowed me to bust out some interesting projects in one session from nothing to "oh my" in no time. I'm thinking they should go back to all their old settings and as a user cap you at their old token limit, and ask you if you want to compact at your "soft" limit or burst for a little longer, to finish a task.