Remix.run Logo
_blk 3 hours ago

From what I understand you shouldn't wait more than 5min between prompts without compacting or clearing or you'll pay for reinitializing the cache. With compaction you still pay but it's less input tokens. (Is compaction itself free?)

gck1 an hour ago | parent | next [-]

Cache ttl on max subscriptions is 1h, FYI.

_blk an hour ago | parent [-]

That'd be awesome but it doesn't reflect what I see. Do you have a source for that? What I see is if take a quick break the session loses ~5% right at the start of the next prompt processing. (I'm currently on max 5x)

gck1 an hour ago | parent | next [-]

Not at my workstation right now, but simply ask claude to analyze jsonl transcript of any session, there are two cache keys there, one is 5m, another 1h. Only 1h gets set. There are also some entries there that will tell you if request was a cache hit or miss, or if cache rewrite happened. I've had claude test another claude and on max 5x subscription, cache miss only happened if message was sent after 1h, or if session was resumed using /resume or --resume (this is a bug that exists since January - all session resumes will cause a full cache rewrite).

However, cache being hit doesn't necessarily mean Anthropic won't just subtract usage from you as if it wasn't hit. It's Anthropic we're talking about. They can do whatever they want with your usage and then blame you for it.

Fabricio20 31 minutes ago | parent | prev | next [-]

I have heard that if you have telemetry disabled the cache is 5 minutes, otherwise 1h. No clue how true that is however my experience (with telemetry enabled) has been the 1h cache.

HarHarVeryFunny 2 minutes ago | parent [-]

They've acknowledged that as a bug and have fixed it.

ethanj8011 an hour ago | parent | prev [-]

It's true as far as I can tell, just by my own checking using `/status`. You can also tell by when the "clear" reminder hint shows up. Also if you look at the leaked claude code you can see that almost everything in the main thread is cached with 1H TTL (I believe subagents use 5 minute TTL)

trueno 2 hours ago | parent | prev | next [-]

is it 5 mins between constant prompting/work or 5 mins as in if i step away from the comp for 5 mins and comp back and prompt again im not subject to reinit?

if it's the latter that's crazy. i dont even know what to do there, compactions already feel like a memory wipe

conception 3 hours ago | parent | prev | next [-]

Yeah the caching change is probably 90% of “i run out of usage so fast now!” Issues.

hgoel 3 hours ago | parent | prev [-]

Ah I can see how my phrasing might be misleading, but these prompts were made within 5 minutes of each other, the timing I mentioned were what Claude spent working.