Remix.run Logo
SyneRyder 4 hours ago

Not only that, but using Opus 4.8 [1m] right now outside the US, and suddenly I only have a 500k context window. I really hope this is just a strange Claude Code bug, but I had access to a 1 Million window before, and it wouldn't entirely surprise me if context window length becomes another US export restriction.

The Anthropic page here seems to say that Max users should have access to the full 1 Million window for 4.8:

https://support.claude.com/en/articles/8606394-how-large-is-...

I was already setting up my infra to experiment with GLM 5.2 and its 1 Million token window before this happened. I think I'm glad I did.

EDIT: Found a solution, seems Claude Code 2.1.193 (or an earlier version I didn't notice) changed default settings, so that if you have Autocompact turned on it occurs at 50% of the context window. If you turn off Autocompact, the full 1 Million context window is restored. Another example of Claude Code quietly changing default settings sigh

vorticalbox 2 hours ago | parent [-]

You want to compact early though as sending the whole chat you will end up with a lot of tokens not in the cache which 1. Costs way more and 2. Will slow the request down as it has to process it all.

SyneRyder an hour ago | parent [-]

I do agree in cases where I'm using API and not the subscription, this would be very costly via API. Not sure why the tokens wouldn't be in the cache though? Seems everything should be cached as long as I'm within the 1 hour caching window? If I'm wrong about how token caching works, I'm eager to learn!

My other concern is, it isn't really a 1 Million context window if we can only use the first 500k, right? But now that I've found that I can re-enable it, I'm happy.

I've previously had sessions go to 700k tokens and still be okay, though it does start drifting at that 700k point. I'm regularly at 300k with no problem.