Remix.run Logo
bcherny 5 hours ago

Hey all, Boris from the Claude Code team here.

We've been investigating these reports, and a few of the top issues we've found are:

1. Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred. To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 claude.

2. People pulling in a large number of skills, or running many agents or background automations, which sometimes happens when using a large number of plugins. This was the case for a surprisingly large number of users, and we are actively working on (a) improving the UX to make these cases more visible to users and (b) more intelligently truncating, pruning, and scheduling non-main tasks to avoid surprise token usage.

In the process, we ruled out a large number of hypotheses: adaptive thinking, other kinds of harness regressions, model and inference regressions.

We are continuing to investigate and prioritize this. The most actionable thing for people running into this is to run /feedback, and optionally post the feedback ids either here or in the Github issue. That makes it possible for us to debug specific reports.

reenorap 5 hours ago | parent | next [-]

Boris, you're seeing a ton of anecdotes here and Claude has done something that has affected a bunch of their most fervent users.

Jeff Bezos famously said that if the anecdotes are contradicting the metrics, then the metrics are measuring the wrong things. I suggest you take the anecdotes here seriously and figure out where/why the metrics are wrong.

toddmorey 5 hours ago | parent | next [-]

On the subject of metrics, better user-facing metrics to understand and debug usage patterns would be a great addition. I'd love an easier way to understand the ave cost incurred by a specific skill, for example. (If I'm missing something obvious, let me know.)

Baking deeper analytics into CC would be helpful... similar to ccusage perhaps: https://github.com/ryoppippi/ccusage

bcherny 5 hours ago | parent | prev [-]

We are taking it seriously, and are continuing to investigate. We are not trusting the metrics.

stevenae 2 hours ago | parent | next [-]

The quantitative ux research team at Google was created for exactly this problem: a service which became popular before the right metrics existed, meaning metrics need to be derived first, then optimized. We would observe users (irl), read their logs, then generate experiments to improve the behavior as measured by logs, and return to see if the experiment improves irl experiences. There were not many of us and we are around :)

blks an hour ago | parent | prev | next [-]

Hopefully yourself, and not via your ai tools.

reenorap 4 hours ago | parent | prev | next [-]

Thank you

Ucalegon 5 hours ago | parent | prev [-]

Cool, are you going to be transparent and explain the metrics and costs as a postmortem? And given the inability to actually audit what you produce, why should we trust Anthropic?

edmundsauto 2 hours ago | parent | next [-]

HN sometimes talks about pathological customers who will never be happy. Boris is probably the single best rep in the community, possibly ever.

The way your tone and complaints come across reminds me of this. As a paying customer ($5k spend per month in my corporate job), I’d rather anthropic keep doing what they’re doing — innovating and shipping useful stuff at blinding speed — and not index on your feedback. I think the tradeoffs they would cost far outweigh the consequences.

mrcwinn 4 hours ago | parent | prev | next [-]

It's incredible that Boris is here on HN being open and sharing an issue they don't fully understand yet, and offering a possible workaround. CTFO.

Thank you Boris.

Ucalegon 3 hours ago | parent [-]

I am sorry you feel this way, but the reality of the situation is there is zero reason to trust anything Anthropic or Boris says. They have no legal liability or obligation to tell the truth, besides brand risk, which to people like you is mitigated for a single person to show up, post, and thats it.

nickandbro 5 hours ago | parent | prev | next [-]

Dang man, chill.

Ucalegon 5 hours ago | parent [-]

Man, expecting the minimal from companies who are supposed to deliver a pro... there is no SLA for any this, so you are right.

Also, why is there no SLA?

946789987649 4 hours ago | parent | next [-]

because there isn't one and people still paid for it.

My clients demand one, so there is one.

Ucalegon 3 hours ago | parent [-]

Imagine if people were like your clients.

alpha_squared 4 hours ago | parent | prev [-]

Because this is ultimately a beta service. The whole industry is.

Ucalegon 4 hours ago | parent [-]

Wait, where is there a 'beta' tag to something that they are charging real money for? Why is this software any different than any other software and we should completely give away our rights as a consumer to ensure what we pay for is delivered?

layer8 3 hours ago | parent | next [-]

I think the parent is saying that one should be aware that the whole LLM industry is still in an experimental stage and far from mature. What you want isn’t what’s being offered. I agree that there should be higher standards, but what we currently have is an arms race. The consequence is to factor that into the value proposition and maybe not rely too much on it.

Ucalegon 3 hours ago | parent [-]

SLAs should be standard for any paid service, especially on the enterprise side, but also on the consumer side. Being immature as a company does not excuse a lack of service delivery.

otterley an hour ago | parent [-]

Not every customer, even a paying customer, demands reliability at a particular level. Market segmentation tends to address those situations: pay more, get more.

otterley 4 hours ago | parent | prev [-]

What right as a consumer do you have that is pertinent here, other than to have the vendor adhere to the terms of the agreement you have with them?

Anthropic has many customers despite the fact that they have occasional problems. They’re not suing Anthropic because Anthropic isn’t promising in its agreement something they can’t deliver.

I think you’re reading into the agreement something that isn’t there, and that’s the cause of your confusion.

Ucalegon 3 hours ago | parent [-]

I am not reading into an agreement, I am saying there is no agreement to be found to ensure service delivery and the associated liability that would come for any SLA. Also, where is the Anthorpic SLA for Enterprise?

Does it exist?

Just because people pay for things doesn't mean they know or understand what they are paying for. Nor is there the legal precedence to actually understand where the rub lies or how that impacts business.

otterley an hour ago | parent [-]

> Just because people pay for things doesn't mean they know or understand what they are paying for.

I believe, respectfully, that’s precisely what is happening in this thread because you keep complaining about the absence of an SLA that was never in the agreement, as though it is—or is supposed to be—there, and therefore the existence of some “rights” that would flow from that.

amirhirsch 5 hours ago | parent | prev [-]

Dude is on hacker news on a Sunday. half the GDP of the world is competing with him. What metrics would you like to see?

Ucalegon 5 hours ago | parent [-]

An enforceable SLA with the services that Anthropic offers rather than putting an employee to respond to things on Sunday.

roamerz 4 hours ago | parent | next [-]

>> rather than putting an employee to respond to things on Sunday.

Maybe just maybe they didn’t put him here, rather he just a normal guy who reads HN, who is passionate about his role, and is here on his own time.

Ucalegon 4 hours ago | parent [-]

Maybe... maybe... maybe... none of this builds trust when there is something that does build trust; putting revenue on the line and opening yourself to legal liability. Otherwise everything is empty and meaningless, its just PR, and nothing more.

otterley 4 hours ago | parent | prev | next [-]

Then you should offer to pay them for one. I’m sure they’d love to hear from you, and they could probably deliver one to you for the right price. But it will be a high price.

Ucalegon 3 hours ago | parent [-]

They don't offer a ZDR [0] for files, even if you have a BAA or dealing with HIPAA data, no matter how much you pay them. Trust me, we have tried.

[0] https://code.claude.com/docs/en/zero-data-retention

otterley an hour ago | parent [-]

I’m really confused. We were talking about SLAs, not other product features. Are you moving the goalposts?

aenis 4 hours ago | parent | prev [-]

Boring corporate Ai will surely come, but hey, lets enjoy the wild west while it lasts. I am grateful to see Boris come here to address problems people face. I 100% sure nobody is making him - he has one of the coolest jobs in the world.

Ucalegon 4 hours ago | parent [-]

>he has one of the coolest jobs in the world.

So that means we just eject any critical thinking when it comes to companies, especially where they is no liability or obligation for them (Boris or Anthropic) to be honest.

Other than 'trust'.

rawicki 5 hours ago | parent | prev | next [-]

For me definitely the worst regression was the system prompt telling claude to analyze file to check if it's malware at every read. That correlates with me seeing also early exhausted quotas and acknowledgments of "not a malware" at almost every step.

It is a horrible error of judgement to insert a complex request for such a basic ability. It is also an error of judgement to make claude make decisions whether it wants to improve the code or not at all.

It is so bad, that i stopped working on my current project and went to try other models. So far qwen is quite promising.

bcherny 5 hours ago | parent [-]

I don't think that's accurate. The malware prompt has been around since Sonnet 3.7. We carefully evaled it for each new model release and found no regression to intelligence, alongside improved scores for cyber risk. That said, we have removed the prompt for Opus 4.6 since it no longer needed it.

rawicki 5 hours ago | parent [-]

I started seeing "not a malware, continuing" in almost every reply since around 2 weeks ago. Maybe you just reintroduced it with some regression? Opus 4.6

bcherny 5 hours ago | parent | next [-]

That's weird. Would you mind running /feedback and sharing the id here next time you see this? I'd love to debug

rawicki 5 hours ago | parent | next [-]

Sure, I really appreciate you looking at this.

a6edd0d1-a9ed-4545-b237-cff00f5be090 / https://github.com/anthropics/claude-code/issues/47027

I'm happy to provide any other info that can be useful (as long as i'm not sharing any information about the code or tools we use into a public github issue).

bcherny 4 hours ago | parent | next [-]

Thanks for the report! This was fixed in v2.1.92.

Please:

1. Upgrade to the latest: claude update (seems like you did this already)

2. Start a new conversations (resuming an old convo may trigger this bug again in that convo)

egamirorrim an hour ago | parent | next [-]

This is bloody great Boris. Thank you.

2 hours ago | parent | prev [-]
[deleted]
bcherny 4 hours ago | parent | prev [-]

Thank you! Looking

obrajesse 5 hours ago | parent | prev [-]

I’ve seen this a couple of times recently. Including right after compact. I’ll /feedback it next time I see it

bavell 5 hours ago | parent | prev | next [-]

I've been using CC a decent amount the past few weeks and have never seen this malware stanza...?

echelon 5 hours ago | parent | prev [-]

1. I've never seen this. Is there a config option to unhide it if it's happening? Is this in Claude Code? Does it have to be set to verbose or something?

2. Can we pay more/do more rigorous KYC to disable it if it's active?

bcherny 5 hours ago | parent [-]

This warning is not enabled for modern models. No action needed. I'm digging into the report above as soon as they're able to /feedback.

mvkel 5 hours ago | parent | prev | next [-]

Why did this become an issue seemingly overnight when 1M context has been available for a while, and I assume prompt caching behavior hasn't changed?

EDIT: prompt caching behavior -did- change! 1hr -> 5min on March 6th. I'm not sure how starting a fresh session fixes it, as it's just rebuilding everything. Why even make this available?

It feels like the rules changed and the attitude from Anth is "aw I'm sorry you didn't know that you're supposed to do that." The whole point of CC is to let it run unattended; why would you build around the behavior of watching it like a hawk to prevent the cache from expiring?

bcherny 5 hours ago | parent [-]

> 1hr -> 5min on March 6th

This is not accurate. The main agent typically uses a 1h cache (except for API customers, which can enable 1h but it is not on by default because it costs more). Sub-agents typically use a 5m cache.

throwdbaaway 5 hours ago | parent | next [-]

https://github.com/anthropics/claude-code/issues/46829#issue... - Have you checked with your colleague? (and his AI, of course)

fluidcruft 4 hours ago | parent [-]

Doesn't what's said at the link approximately agree? The 5m bug was said to be isolated to use of overage (API billing).

highd an hour ago | parent | prev | next [-]

... so how do API users enable 1hr caching? I haven't found a setting anywhere.

aaronblohowiak 5 hours ago | parent | prev [-]

So if I run a test suite or compile my rust program in a sub agent I’m going to get cache misses? Boo.

skeledrew 3 hours ago | parent [-]

Sub agents don't have much context and don't stay around for long, so misses in that case are trivial.

mmd45 10 minutes ago | parent | prev | next [-]

shouldn't compaction be interactive with the user as to what context will continue to be the most relevant in the future??? what if the harness allowed for a turn to clarify the user's expected future direction of the conversation and did the consolidation based upon the addition info?

there definitely seems to be a benefit to pruning the context and keeping the signal to noise high wrt what is still to be discussed.

8note an hour ago | parent | prev | next [-]

> Since Claude Code uses a 1 hour prompt cache window for the main agent

this seems a bit awkward vs the 5 hour session windows.

if i get rate limited once, I'll get rate limited immediately again on the same chat when the rate limit ends?

any chance we can get some form of deffered cache so anything on a rate limited account gets put aside until the rate limit ends?

j-pb 4 hours ago | parent | prev | next [-]

The /clear nudge isn't a solution though. Compacting or clearing just means rebuilding context until Claude is actually productive again. The cost comes either way. I get that 1M context windows cost more than the flat per-token price reflects, because attention scales with context length, but the answer to that is honest pricing or not offering it. Not annoying UX nudges. What’s actually indefensible is that Claude is already pushing users to shrink context via, I presume, system prompt. At maybe 25% fill:

  “This seems like a good opportunity to wrap it up and continue in a fresh context window.”
  “Want to continue in a fresh context window? We got a lot of work done and this next step seems to deserve a fresh start!”
If there’s a cost problem, fix the pricing or the architecture. But please stop the model and UI from badgering users into smaller context windows at every opportunity. That is not a solution, it’s service degradation dressed as a tooltip.
denysvitali 5 hours ago | parent | prev | next [-]

OpenAI (Codex) keeps on resetting the usage limits each time they fuck up...

I have yet to see Anthropic doing the same. Sorry but this whole thing seems to be quite on purpose.

losteric 5 hours ago | parent | next [-]

It doesn’t seem like Anthropic is fucking up?

I use Claude Code about 8hrs every work day extensively, and have yet to see any issues.

It really does seem like PEBKAC.

mlinsey 5 hours ago | parent | next [-]

Different users do seem to be encountering problems or not based on their behavior, but for a rapidly-evolving tool with new and unclear footguns, I wouldn't characterize that as user error.

For example, I don't pull in tons of third-party skills, preferring to have a small list of ones I write and update myself, but it's not at all obvious to me that pulling in a big list of third-party skills (like I know a lot of people do with superpowers, gstack, etc...) would cause quota or cache miss issues, and if that's causing problems, I'd call that more of a UX footgun than user error. Same with the 1M context window being a heavily-touted feature that's apparently not something you want to actually take advantage of...

denysvitali 5 hours ago | parent | prev | next [-]

Me and my colleagues faced, over the last ~1 month or so, the same issues.

With a new version of Claude Code pretty much each day, constant changes to their usage rules (2x outside of peak hours, temporarily 2x for a few weeks, ...), hidden usage decisions (past 256k it looks like your usage consumes your limits faster) and model degradation (Opus 4.6 is now worse than Opus 4.5 as many reported), I kind of miss how it can be an user error.

The only user error I see here is still trusting Anthropic to be on the good side tbh.

If you need to hear it from someone else: https://www.youtube.com/watch?v=stZr6U_7S90

bcherny 5 hours ago | parent [-]

> past 256k it looks like your usage consumes your limits faster

This is false. My guess is what is happening is #1 above, where restarting a stale session causes a 256k cache miss.

That said, I hear the frustration. We are actively working on improving rate limit predictability and visibility into token usage.

tetraodonpuffer 4 hours ago | parent [-]

just like everybody else I and my colleagues at work have seen major regressions in terms of available usage over the past month, seemingly unrelated to caching/resuming. On an enterprise sub doing the same work I personally went from being able to have several sessions running concurrently without hitting limits, to only having one session at a time and hitting my 5h every day twice a day in 3-4 hours tops (and due to the apparent lower intelligence I have been at the terminal watching what opus is doing like a hawk, so it's not a I went for coffee I have to hit the cache). The first day I ever hit my 5h this year was the day everybody reported it (I think it was the Monday you introduced the 2x promotion after hours? not sure, like 3 weeks ago?)

To avoid 1M issues, this week I have also intentionally used the 256k context model, disabled adaptive thinking and did the same "plans in multiple short steps with /clear in-between" to minimize context usage, and yet nothing helps. It just feels ~2x to ~3x less tokens than before, and a lot less smart than in February.

Nowadays every time I complete a plan I spend several sessions afterwards saying things like "we have done plan X, the changes are uncommitted, can you take a look at what we did" and every time it finds things that were missed or outright (bad) shortcuts/deviations from plan despite my settings.json having a clear "if in doubt ask the user, don't just take the easy way out". As a random data point, just today opus halfway through a session told me to make a change to code inside a pod then rollout restart it to use said change, and when called out on it it of course said that I was right and of course that wouldn't work...

It is understandable that given your incredible growth you are between a rock and a hard place and have to tweak limits, compute does not grow on trees, but the consistent "you are holding it wrong" messaging is not helpful. I am wondering if realistically your only option is to move everybody to metered, with clear token usage displayed, and maybe have pro/max 5/max 20 just be a "your first $x of tokens is 50/75% off". Allow folks to tweak the thinking budget, and change the system prompt to remove things like "try the easy solution first" which anecdotally has been introduced in the past while, and allow users to verify on prompt if the prompt would cause the whole context to be sent or if cache is available.

mvkel 5 hours ago | parent | prev | next [-]

Why did it suddenly become an issue, despite prompt caching behavior being unchanged?

ScoobleDoodle 5 hours ago | parent | prev | next [-]

PEBKAC: Problem Exists Between Keyboard And Chair

extr 5 hours ago | parent | prev | next [-]

Yes same here. I use CC almost constantly every day for months across personal and work max/team accounts, as well as directly via API on google vertex. I have hardly ever noticed an issue (aside from occasional outages/capacity issues, for which I switch to API billing on Vertex). If anything it works better than ever.

varispeed 4 hours ago | parent | prev [-]

You know that people are not using the same resources? It's like 9 out of 10 computers get borked and you have the 1 that seems okay and you essentially say "My computer works fine, therefore all computers work fine." Come on dude.

weird-eye-issue 5 hours ago | parent | prev | next [-]

Can you clearly state what they messed up?

nodja 4 hours ago | parent [-]

Not parent but I can guess from watching mostly from the sidelines.

They introduced a 1M context model semi-transparently without realizing the effects it would have, then refused to "make it right' to the customer which is a trait most people expect from a business when they spend money on it, specially in the US, and specially when the money spent is often in the thousands of dollars.

Unless anthropic has some secret sauce, I refuse to believe that their models perform anywhere near the same on >300k context sizes than they do on 100k. People don't realize but even a small drop in success rate becomes very noticeable if you're used to have near 100%, i.e. 99% -> 95% is more noticeable than 55% -> 50%.

I got my first claude sub last month (it expires in 4 days) and I've used it on some bigish projects with opencode, it went from compacting after 5-10 questions to just expanding the context window, I personally notice it deteriorating somewhere between 200-300k tokens and I either just fork a previous context or start a new one after that because at that size even compacting seems to generate subpar summaries. It currently no longer works with opencode so I can't attest to how it well it worked the past week or so.

If the 1M model introduction is at fault for this mass user perception that the models are getting worse, then it's anthropics fault for introducing confusion into the ecosystem. Even if there was zero problems introduced and the 1M model was perfect, if your response when the users complain is to blame it on the user, then don't expect the user will be happy. Nobody wants to hear "you're holding it wrong", but it seems that anthropic is trying to be apple of LLMs in all the wrong ways as well.

atonse 4 hours ago | parent | next [-]

I still love Claude and nothing but a ton of respect for Boris and the team building such a phenomenal product.

That said, I feel that things started to feel a bit off usage-wise after the introduction of 1M context.

I'd personally be happy to disable it and go back to auto-compacting because that seems to have been the happy medium.

logicchains 4 hours ago | parent | prev [-]

Especially since Codex faced the same issue but the team decided to explicitly default to only ~200k context to avoid surprises and degradation for users.

Madmallard 5 hours ago | parent | prev [-]

Money money money money

throwaway2027 5 hours ago | parent | prev | next [-]

I don't want a nudge. I want a clear RED WARNING with "You've gone away from your computer a bit too long and chatted too much at the coffee machine. You're better off starting a new context!"

bcherny 5 hours ago | parent | next [-]

Ack, it is currently blue but we can make it red

SpaceNoodled 4 hours ago | parent | prev [-]

Why is nobody even asking why that should be an issue? No other text editor shits the bed that way. The whole point of the computer is that it patiently waits for my input.

GeoAtreides 4 hours ago | parent [-]

let me put this way: not your ram, not your cache, not waiting patiently for your input.

avree 5 hours ago | parent | prev | next [-]

Hey Boris - why is the best way to get support making a Hacker News or X post, and hoping you reply? Why does Anthropic Enterprise Support never respond to inquiries?

egamirorrim 41 minutes ago | parent [-]

I mean if we're building an unrelated wishlist... Can 20x max users get auto mode already? Or can the enterprise plans get something equivalent to 20x max?

Given I'm running two max accounts to get the usage I want, can we get a 25x and 40x tier? :-)

brokencode 5 hours ago | parent | prev | next [-]

Would it be possible to increase the cache duration if misses are a frequent source of problems?

Maybe using a heartbeat to detect live sessions to cache longer than sessions the user has already closed. And only do it for long sessions where a cache miss would be very expensive.

bcherny 5 hours ago | parent [-]

Yes, we're trying a couple of experiments along these lines. Good intuition.

apgwoz 2 hours ago | parent | prev | next [-]

As another data point, I pay for Pro for a personal account, and use no skills, do nothing fancy, use the default settings, and am out of tokens, with one terminal, after an hour. This is typically working on a < 5,000 line code base, sometimes in C, sometimes in Go. Not doing incredibly complicated things.

yummytummy 5 hours ago | parent | prev | next [-]

Ah, so cache usage impacts rate limits. There goes the ”other harnesses aren’t utilizing the cache as efficiently” argument.

bcherny 5 hours ago | parent [-]

Claude Code is the most prompt cache-efficient harness, I think. The issue is more that the larger the context window, the higher the cost of a cache miss.

simsla an hour ago | parent | next [-]

I do wonder if it's fair to expect users to absorb cache miss costs when using Claude Code given how untransparent these are.

yummytummy 5 hours ago | parent | prev | next [-]

That might be, but the argument was that poor cache utilization was costing Anthropic too much money in other harnesses. If cache is considered in rate limits, it doesn’t matter from a cost perspective, you’ll just hit your rate limits faster in other harnesses that don’t try to cache optimize.

bcherny 5 hours ago | parent [-]

There were two issues with some other 3p harnesses:

1. Poor cache utilization. I put up a few PRs to fix these in OpenClaw, but the problem is their users update to new versions very slowly, so the vast majority of requests continued to use cache inefficiently.

2. Spiky traffic. A number of these harnesses use un-jittered cron, straining services due to weird traffic shape. Same problem -- it's patched, but users upgrade slowly.

We tried to fix these, but in the end, it's not something we can directly influence on users' behalf, and there will likely be more similar issues in the future. If people want to use these they are welcome to, but subscriptions clients need to be more efficient than that.

SyneRyder 4 hours ago | parent | next [-]

How much jitter would you prefer, how many seconds / minutes out? I have some morning tasks that run while I'm asleep via claude -p, and it sounds like I'm slightly contributing to your spikes (presumably hourly and on quarter hours).

Deathmax 2 hours ago | parent [-]

[dead]

dollspace 3 hours ago | parent | prev [-]

If you give doll a list of things you want to see from third party harnesses, a compliance checklist it will make sure the one it is building follows it to the letter.

eastbound 5 hours ago | parent | prev | next [-]

I’m sorry but when you wake up in the morning with 12% of your session used, saying “it’s the cache” is not an appropriate answer.

And I’m using Claude on a small module in my project, the automations that read more to take up more context are a scam.

beacon294 4 hours ago | parent | prev [-]

Politely, no.

- I wrote an extension in Pi to warm my cache with a heartbeat.

- I wrote another to block submission after the cache expired (heartbeats disabled or run out)

- I wrote a third to hard limit my context window.

- I wrote a fourth to handle cache control placement before forking context for fan out.

- my initial prompt was 1000 tokens, improving cache efficiency.

Anthropic is STOMPING on the diversity of use cases of their universal tool, see you when you recover.

fps-hero 5 hours ago | parent | prev | next [-]

Am I so out of touch?

No! It’s the children who are wrong!

accounting2026 4 hours ago | parent | prev | next [-]

While off-topic, thanks a lot for Claude Code, but also thanks to you/Anthropic for the responsible decisions regarding Mythos, even in the face of allegation of hype etc. It is much appreciated by many!

yumraj 3 hours ago | parent | prev | next [-]

> Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead

I don’t understand this. I frequently have long breaks. I never want to clear or even compact because I don’t want to lose the conversations that I’ve had and the context. Clearing etc causes other issues like I have to restate everything at times and it misses things. I do try to update the memory which helps. I wish there was a better solution than a time bound cache

cowwoc2020 an hour ago | parent [-]

Makes me wish that shortly before the server-side expiration, we could save the cache on the client-side, indefinitely.

But my understanding is that we're talking about ~60GB of data per session, so it sounds unrealistic to do...

samuelknight 5 hours ago | parent | prev | next [-]

Have you considered poking the cache?

When a user walks away during the business day but CC is sitting open, you can refresh that cache up to 10x before it costs the same as a full miss. Realistically it would be <8x in a working day.

999900000999 5 hours ago | parent | prev | next [-]

You've created quite a conundrum.

The only people who are going to run into issues are superpower users who are running this excessively beyond any reasonable measure.

Most people are going to be quite happy with your service. But at the same time, and this is just a human nature thing people are 10 times more likely we complain about an issue than to compliment something working well.

I don't know how to fix this, but I strongly suspect this isn't really a technical issue. It's more of a customer support one.

danmaz74 4 hours ago | parent | prev | next [-]

Could we get an option to use Opus with a smaller context window? I noticed that results get much worse way earlier than when you reach 1M tokens, and I would love to have a setting so that I could force a compaction at eg 300k tokens.

SyneRyder 4 hours ago | parent [-]

You probably just missed it in his post, but:

"To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 claude."

Maybe try changing the 4 to a 3 and see if that works for you?

KronisLV 3 hours ago | parent | prev | next [-]

> defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred

This seems really useful!

I'm surprised that "Opus 4.6" (200K) and "Opus 4.6 1M" are the only Opus options in the desktop app, whereas in the CLI/TUI app you don't seem to even get that distinction.

I bet that for a lot of folks something like 400k, 600k or 800k would work as better defaults, based on whatever task they want to work on.

ramon156 5 hours ago | parent | prev | next [-]

Boris, wasnt this the same thing ~2 weeks ago? Is it the same cache misses as before? What's the expected time till solved? Seems like its taking a while

ahofmann 3 hours ago | parent | prev | next [-]

Resizing the context window seems like a very good idea to me. I noticed a decline of productivity when the 1M context window was released and I'd like to bring it back to 200k, because it was totally fine for the things I was working on.

g4cg54g54 2 hours ago | parent | prev | next [-]

from looking at the raw requests, that cant seem right?

its all "cache_control": { "type": "ephemeral" } there is no "ttl" anywhere.

// edit: cc_version=2.1.104.f27

hughw 5 hours ago | parent | prev | next [-]

Where can i learn about concepts like prompt cache misses? I don't have a mental model how that interacts with my context of 1M or 400k tokens... I can cargo cult follow instructions of course but help us understand if you can so we can intelligently adapt our behavior. Thanks.

bcherny 5 hours ago | parent | next [-]

The docs are a good place to start: https://platform.claude.com/docs/en/build-with-claude/prompt...

snthpy 4 hours ago | parent [-]

Thanks. Just noting that those docs say the cache duration is 5 min and not 1 hour as stated in sibling comment:

> By default, the cache has a 5-minute lifetime. The cache is refreshed for no additional cost each time the cached content is used. > > If you find that 5 minutes is too short, Anthropic also offers a 1-hour cache duration at additional cost.

yoaviram 23 minutes ago | parent [-]

Apparently Anthropic downgraded cache TTL to 5 min without telling anyone. My biggest issue with the recent issues with Claude Code is the lack transparency, although it looks like even Boris doesn't know about one: https://news.ycombinator.com/item?id=47736476

hughw 5 hours ago | parent | prev [-]

And why does /clear help things? Doesn't that wipe out the history of that session? Jeez.

docheinestages 5 hours ago | parent | prev | next [-]

Why are you all of a sudden running into so many issues like this? Could it be that all of the Anthropics employees have completely unlimited and unbounded accounts, which means you don't get a feeling of how changes will affect the customers?

bcherny 5 hours ago | parent | next [-]

The number of people using Claude Code has grown very quickly, which means:

- More configurations and environments we need to test

- Given an edge/corner case, it is more likely a significant number of users run into it

- As the ecosystem has grown, more people use skills and plugins, and we need to offer better tools and automation to ensure these are efficient

We do actually dogfood rate limits, so I think it's some combination of the above.

nothinkjustai 2 hours ago | parent | prev [-]

Because it’s completely vibe coded? And the codebase goes through massive churn, which means things that were stable get rewritten possibly with bugs.

egamirorrim 38 minutes ago | parent [-]

You can get Claude Code to write tests too...

_fizz_buzz_ 5 hours ago | parent | prev | next [-]

I have a feature request: I build an mcp server, but now it has over 60 tools. Most sessions i really don’t need most of them. I suppose I could make this into several servers. But it would maybe be nice to give the user more power here. Like let me choose the tools that should be loaded or let me build servers that group tools together which can be loaded. Not sure if that makes sense …

throwpoaster 2 hours ago | parent | prev | next [-]

Have you tried asking Mythos for a fix?

earino 4 hours ago | parent | prev | next [-]

Hello Boris! How do I increase the 1 hour prompt cache window for the main agent? I would love to be able to set that to, say, 4 hours. That gives me enough time to work on something, go teach a class, grab a snack, and come back and pick up where I left off.

fluidcruft 5 hours ago | parent | prev | next [-]

How can we turn of 1m context? I don't find it has ever helped.

mwigdahl 4 hours ago | parent [-]

He mentioned this in his original comment:

"CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000"

j45 5 hours ago | parent | prev | next [-]

Pulling all the skills and agents in the world in, when unused are a big hit. I deleted all of mine and added back as needed and there was an improvement.

Running Claude Cowork in the background will hit tokens and it might not be the most efficient use of token use.

Last, but not least, turning off 1M token context by default is helpful.

jauntywundrkind 3 hours ago | parent | prev | next [-]

There's an issue someone raised showing that prompt caches are only 5 minutes.

The reply seems to be: oh huh, interesting. Maybe that's a good thing since people sometimes one-shot? That doesn't feel like the messaging I want to be reading, and the way it conflicts with the message here that cache is 1 hour is confusing.

https://news.ycombinator.com/item?id=47741755

Is there any status information or not on whether cache is used? It sure looks like the person analyzing the 5m issue had to work extremely hard to get any kind of data. It feels like the iteration loop of people getting better at this stuff would go much much better if this weren't such a black box, if we had the data to see & understand: is the cache helping?

3 hours ago | parent | prev | next [-]
[deleted]
re-thc 4 hours ago | parent | prev | next [-]

> To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session)

Is this really an improvement? Shouldn't this be something you investigate before introducing 1M context?

What is a long stale session?

If that's not how Claude Code is intended to be used it might as well auto quit after a period of time. If not then if it's an acceptable use case users shouldn't change their behavior.

> People pulling in a large number of skills, or running many agents or background automations, which sometimes happens when using a large number of plugins.

If this was an issue there should have been a cap on it before the future was released and only increased once you were sure it is fine? What is "a large number"? Then how do we know what to do?

It feels like "AI" has improved speed but is in fact just cutting corners.

5 hours ago | parent | prev | next [-]
[deleted]
EGreg 4 hours ago | parent | prev | next [-]

Boris, is the KV cache TTL now reduced to 5 minutes from 1 hour?

I think this may be the biggest concern for people building tools on the API: https://github.com/anthropics/claude-code/issues/46829

I would argue that KV caching is a net gain for Ant and a well-maintained cache is the biggest thing that can generate induced demand and a thriving third party ecosystem. https://safebots.ai/papers/KV.pdf

dkersten 5 hours ago | parent | prev | next [-]

Eh you say that every time and yet it keeps happening.

varispeed 4 hours ago | parent | prev | next [-]

Can you explain why Opus 4.6 suddenly becomes dumb as a sack of potatoes, even if context is barely filled?

Can you explain why Opus 4.6 will be coming up with stupid solutions only to arrive at a good one when you mention it is trying to defraud you?

I have a feeling the model is playing dumb on purpose to make user spend more money.

This wasn't the case weeks ago when it actually working decently.

throwpoaster 2 hours ago | parent | prev | next [-]

Wait what? If I get told to come back in three hours because I'm using the product too much, I get penalized when I resume?

What's the right way to work on a huge project then? I've just been saying "Please continue" -- that pops the quota?

MuffinFlavored 4 hours ago | parent | prev [-]

I wish people would pay more attention to:

* Anthropic is in some way trying to run a business (not a charity) and at least (eventually?) make money and not subsidize usage forever

* "What a steal/good deal" the $100-$200/mo plans are compared to if they had to pay for raw API usage

and less on "how dare you reserve the right to tweak the generous usage patterns you open-ended-ly gave us, we are owed something!"

lbreakjai 3 hours ago | parent | next [-]

As an (ex) paying customer, I'm expecting some consistency. I used to be satisfied with the value I got, until the limits changed overnight, and I'd get a ten of my previous usage.

If Anthropic is allowed to alter the deal whenever, then I'd expect to be able to get my money back, pro-rata, no questions asked.

logicchains 4 hours ago | parent | prev [-]

All those apply to OpenAI+Codex too, but they're far more generous with limits than Anthropic, and with granting fresh limits to apologize when they fuck up.