Remix.run Logo
denysvitali 5 hours ago

OpenAI (Codex) keeps on resetting the usage limits each time they fuck up...

I have yet to see Anthropic doing the same. Sorry but this whole thing seems to be quite on purpose.

losteric 5 hours ago | parent | next [-]

It doesn’t seem like Anthropic is fucking up?

I use Claude Code about 8hrs every work day extensively, and have yet to see any issues.

It really does seem like PEBKAC.

mlinsey 5 hours ago | parent | next [-]

Different users do seem to be encountering problems or not based on their behavior, but for a rapidly-evolving tool with new and unclear footguns, I wouldn't characterize that as user error.

For example, I don't pull in tons of third-party skills, preferring to have a small list of ones I write and update myself, but it's not at all obvious to me that pulling in a big list of third-party skills (like I know a lot of people do with superpowers, gstack, etc...) would cause quota or cache miss issues, and if that's causing problems, I'd call that more of a UX footgun than user error. Same with the 1M context window being a heavily-touted feature that's apparently not something you want to actually take advantage of...

denysvitali 5 hours ago | parent | prev | next [-]

Me and my colleagues faced, over the last ~1 month or so, the same issues.

With a new version of Claude Code pretty much each day, constant changes to their usage rules (2x outside of peak hours, temporarily 2x for a few weeks, ...), hidden usage decisions (past 256k it looks like your usage consumes your limits faster) and model degradation (Opus 4.6 is now worse than Opus 4.5 as many reported), I kind of miss how it can be an user error.

The only user error I see here is still trusting Anthropic to be on the good side tbh.

If you need to hear it from someone else: https://www.youtube.com/watch?v=stZr6U_7S90

bcherny 5 hours ago | parent [-]

> past 256k it looks like your usage consumes your limits faster

This is false. My guess is what is happening is #1 above, where restarting a stale session causes a 256k cache miss.

That said, I hear the frustration. We are actively working on improving rate limit predictability and visibility into token usage.

tetraodonpuffer 4 hours ago | parent [-]

just like everybody else I and my colleagues at work have seen major regressions in terms of available usage over the past month, seemingly unrelated to caching/resuming. On an enterprise sub doing the same work I personally went from being able to have several sessions running concurrently without hitting limits, to only having one session at a time and hitting my 5h every day twice a day in 3-4 hours tops (and due to the apparent lower intelligence I have been at the terminal watching what opus is doing like a hawk, so it's not a I went for coffee I have to hit the cache). The first day I ever hit my 5h this year was the day everybody reported it (I think it was the Monday you introduced the 2x promotion after hours? not sure, like 3 weeks ago?)

To avoid 1M issues, this week I have also intentionally used the 256k context model, disabled adaptive thinking and did the same "plans in multiple short steps with /clear in-between" to minimize context usage, and yet nothing helps. It just feels ~2x to ~3x less tokens than before, and a lot less smart than in February.

Nowadays every time I complete a plan I spend several sessions afterwards saying things like "we have done plan X, the changes are uncommitted, can you take a look at what we did" and every time it finds things that were missed or outright (bad) shortcuts/deviations from plan despite my settings.json having a clear "if in doubt ask the user, don't just take the easy way out". As a random data point, just today opus halfway through a session told me to make a change to code inside a pod then rollout restart it to use said change, and when called out on it it of course said that I was right and of course that wouldn't work...

It is understandable that given your incredible growth you are between a rock and a hard place and have to tweak limits, compute does not grow on trees, but the consistent "you are holding it wrong" messaging is not helpful. I am wondering if realistically your only option is to move everybody to metered, with clear token usage displayed, and maybe have pro/max 5/max 20 just be a "your first $x of tokens is 50/75% off". Allow folks to tweak the thinking budget, and change the system prompt to remove things like "try the easy solution first" which anecdotally has been introduced in the past while, and allow users to verify on prompt if the prompt would cause the whole context to be sent or if cache is available.

mvkel 5 hours ago | parent | prev | next [-]

Why did it suddenly become an issue, despite prompt caching behavior being unchanged?

ScoobleDoodle 5 hours ago | parent | prev | next [-]

PEBKAC: Problem Exists Between Keyboard And Chair

extr 5 hours ago | parent | prev | next [-]

Yes same here. I use CC almost constantly every day for months across personal and work max/team accounts, as well as directly via API on google vertex. I have hardly ever noticed an issue (aside from occasional outages/capacity issues, for which I switch to API billing on Vertex). If anything it works better than ever.

varispeed 4 hours ago | parent | prev [-]

You know that people are not using the same resources? It's like 9 out of 10 computers get borked and you have the 1 that seems okay and you essentially say "My computer works fine, therefore all computers work fine." Come on dude.

weird-eye-issue 5 hours ago | parent | prev | next [-]

Can you clearly state what they messed up?

nodja 5 hours ago | parent [-]

Not parent but I can guess from watching mostly from the sidelines.

They introduced a 1M context model semi-transparently without realizing the effects it would have, then refused to "make it right' to the customer which is a trait most people expect from a business when they spend money on it, specially in the US, and specially when the money spent is often in the thousands of dollars.

Unless anthropic has some secret sauce, I refuse to believe that their models perform anywhere near the same on >300k context sizes than they do on 100k. People don't realize but even a small drop in success rate becomes very noticeable if you're used to have near 100%, i.e. 99% -> 95% is more noticeable than 55% -> 50%.

I got my first claude sub last month (it expires in 4 days) and I've used it on some bigish projects with opencode, it went from compacting after 5-10 questions to just expanding the context window, I personally notice it deteriorating somewhere between 200-300k tokens and I either just fork a previous context or start a new one after that because at that size even compacting seems to generate subpar summaries. It currently no longer works with opencode so I can't attest to how it well it worked the past week or so.

If the 1M model introduction is at fault for this mass user perception that the models are getting worse, then it's anthropics fault for introducing confusion into the ecosystem. Even if there was zero problems introduced and the 1M model was perfect, if your response when the users complain is to blame it on the user, then don't expect the user will be happy. Nobody wants to hear "you're holding it wrong", but it seems that anthropic is trying to be apple of LLMs in all the wrong ways as well.

atonse 4 hours ago | parent | next [-]

I still love Claude and nothing but a ton of respect for Boris and the team building such a phenomenal product.

That said, I feel that things started to feel a bit off usage-wise after the introduction of 1M context.

I'd personally be happy to disable it and go back to auto-compacting because that seems to have been the happy medium.

logicchains 4 hours ago | parent | prev [-]

Especially since Codex faced the same issue but the team decided to explicitly default to only ~200k context to avoid surprises and degradation for users.

Madmallard 5 hours ago | parent | prev [-]

Money money money money