| ▲ | giancarlostoro 5 hours ago | |||||||
I've been talking to friends about this extensively, and read all sorts of different social media posts on X where people deep dove things (I'm at work so I don't have any links handy - though I did submit one on HN, grain of salt, unsure how valid it is but it was interesting: https://news.ycombinator.com/item?id=47752049 ). I think the real issue stems from the 1 Million token context window change. They did not anticipate the amount of load it would give you. That first few days after they released the new token window, I was making amazing things in one single session from nothing, to something (a new .NET based programming language inspired by Python, and a Virtual Actor framework in Rust). I think since then they've been trying too many things to tweak things, whilst irritating their users. They even added a new "Max" thinking mode, and made "High" the old medium, which is ridiculous because you think you're using "High" but really you're not. There's a hidden config file to change their terrible defaults to let Claude be smarter still, and apparently you can toggle off the 1M tokens. I think the real fix, and I'm surprised nobody there has done this yet, is to let the user trim down their context window. Think about it, you used to have what? 350k tokens or so? Now Claude will keep sending your prompt from 30 minutes ago that's completely irrelevant to the back-end, whereas 3 months ago it would have been compacted by now. Others have noted that similar prompting for some ungodly reason adds tens of thousands of extra garbage tokens (not sure why). Edit looks like someone figured out that if you downgrade your version of Claude Code and change one single setting it unruins Claude: | ||||||||
| ▲ | SkyPuncher 2 hours ago | parent | next [-] | |||||||
Yea, I've realized that if I stay under 200k tokens I basically don't have usage issues any more. A bit annoying, but not the end of the world. | ||||||||
| ||||||||
| ▲ | dacox 4 hours ago | parent | prev [-] | |||||||
Yeah, I have been seeing lots of comments, tweets, etc, but given everything I have learned about these models - i do not think the change to 1M was innocuous. I'm not sure what they've claimed publicly, but I'm fairly certain they must be doing additional quantization, or at minimum additional quantization of the KV cache. Plus, sequence length can change things even when not fully utilized. I had to manually re-enable the "clear context and continue" feature as well. | ||||||||
| ||||||||