Remix.run Logo
convenwis 17 hours ago

Is there a writeup anywhere on what this means for effective context? I think that many of us have found that even when the context window was 100k tokens the actual usable window was smaller than that. As you got closer to 100k performance degraded substantially. I'm assuming that is still true but what does the curve look like?

esperent 11 hours ago | parent | next [-]

> As you got closer to 100k performance degraded substantially

In practice, I haven't found this to be the case at all with Claude Code using Opus 4.6. So maybe it's another one of those things that used to be true, and now we all expect it to be true.

And of course when we expect something, we'll find it, so any mistakes at 150k context use get attributed to the context, while the same mistake at 50k gets attributed to the model.

peacebeard 5 hours ago | parent [-]

My personal experience is that Opus 4.6 degrades after a while but the degradation is more subtle and less catastrophic than in the past. I still aggressively clear sessions to keep it sharp though.

dcre 8 hours ago | parent | prev | next [-]

Personally, even though performance up to 200k has improved a lot with 4.5 and 4.6, I still try to avoid getting up there — like I said in another comment, when I see context getting up to even 100k, I start making sure I have enough written to disk to type /new, pipe it the diff so far, and just say “keep going.” I feel like the dropoff starts around maybe 150k, but I could be completely wrong. I thought it was funny that the graph in the post starts at 256k, which convenient avoids showing the dropoff I'm talking about (if it's real).

tyleo 13 hours ago | parent | prev | next [-]

I mentioned this at work but context still rots at the same rate. 90k tokens consumed has just as bad results in 100k context window or 1M.

Personally, I’m on a 6M+ line codebase and had no problems with the old window. I’m not sending it blindly into the codebase though like I do for small projects. Good prompts are necessary at scale.

minimaxir 17 hours ago | parent | prev | next [-]

The benchmark charts provided are the writeup. Everything else is just anecdata.

FartyMcFarter 12 hours ago | parent | prev [-]

Isn't transformer attention quadratic in complexity in terms of context size? In order to achieve 1M token context I think these models have to be employing a lot of shortcuts.

I'm not an expert but maybe this explains context rot.

vlovich123 11 hours ago | parent [-]

Nope, there’s no tricks unless there’s been major architectural shifts I missed. The rot doesn’t come from inference tricks to try to bring down quadratic complexity of the KV cache. Task performance problems are generally a training problem - the longer and larger the data set, the fewer examples you have to train on it. So how do you train the model to behave well - that’s where the tricks are. I believe most of it relies on synthetically generated data if I’m not mistaken, which explains the rot.

FartyMcFarter 3 hours ago | parent [-]

A quick Google search reveals terms such as "sparse attention" that are used to avoid quadratic runtime.

I don't know if Anthropic has revealed such details since AI research is getting more and more secretive, but the architectural tricks definitely exist.