Remix.run Logo
bcherny 2 days ago

Claude Code is the most prompt cache-efficient harness, I think. The issue is more that the larger the context window, the higher the cost of a cache miss.

simsla 2 days ago | parent | next [-]

I do wonder if it's fair to expect users to absorb cache miss costs when using Claude Code given how untransparent these are.

beacon294 2 days ago | parent | prev | next [-]

Politely, no.

- I wrote an extension in Pi to warm my cache with a heartbeat.

- I wrote another to block submission after the cache expired (heartbeats disabled or run out)

- I wrote a third to hard limit my context window.

- I wrote a fourth to handle cache control placement before forking context for fan out.

- my initial prompt was 1000 tokens, improving cache efficiency.

Anthropic is STOMPING on the diversity of use cases of their universal tool, see you when you recover.

yummytummy 2 days ago | parent | prev | next [-]

That might be, but the argument was that poor cache utilization was costing Anthropic too much money in other harnesses. If cache is considered in rate limits, it doesn’t matter from a cost perspective, you’ll just hit your rate limits faster in other harnesses that don’t try to cache optimize.

bcherny 2 days ago | parent [-]

There were two issues with some other 3p harnesses:

1. Poor cache utilization. I put up a few PRs to fix these in OpenClaw, but the problem is their users update to new versions very slowly, so the vast majority of requests continued to use cache inefficiently.

2. Spiky traffic. A number of these harnesses use un-jittered cron, straining services due to weird traffic shape. Same problem -- it's patched, but users upgrade slowly.

We tried to fix these, but in the end, it's not something we can directly influence on users' behalf, and there will likely be more similar issues in the future. If people want to use these they are welcome to, but subscriptions clients need to be more efficient than that.

SyneRyder 2 days ago | parent | next [-]

How much jitter would you prefer, how many seconds / minutes out? I have some morning tasks that run while I'm asleep via claude -p, and it sounds like I'm slightly contributing to your spikes (presumably hourly and on quarter hours).

Deathmax 2 days ago | parent [-]

There's prior art from Claude's own scheduled tasks' jitter: https://code.claude.com/docs/en/scheduled-tasks#jitter

> Recurring tasks fire up to 10% of their period late, capped at 15 minutes. An hourly job might fire anywhere from :00 to :06.

> One-shot tasks scheduled for the top or bottom of the hour fire up to 90 seconds early.

dollspace 2 days ago | parent | prev [-]

If you give doll a list of things you want to see from third party harnesses, a compliance checklist it will make sure the one it is building follows it to the letter.

eastbound 2 days ago | parent | prev [-]

I’m sorry but when you wake up in the morning with 12% of your session used, saying “it’s the cache” is not an appropriate answer.

And I’m using Claude on a small module in my project, the automations that read more to take up more context are a scam.