Remix.run Logo
deviation 4 days ago

This makes sense if we compare compute cost instead of hours.

Transformer self-attention costs scale roughly quadratically with context window size. Servicing prompts in a 32k-token window uses much more compute per request than in an 8k-token window.

A Max 5× user on an 8k-token window might exhaust their cap in around 30 hours, while a Max 20× user on a 32k-token window will exhaust theirs in about 35 to 39 hours instead of four times as long.

If you compact often, keep context windows small etc, I'd wager that your Opus 4 consumption would approach the expected 4× multiplier... In reality, I assume the majority of users aren't clearing their context windows and just letting the auto-compact do it's thing.

Visualization: https://codepen.io/Sunsvea/pen/vENyeZe