Remix.run Logo
chandureddyvari 8 hours ago

Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect. Then 30 minutes later I hit session limits. Three sessions like that in a day, and suddenly 25% of the weekly limit is gone.

I ended up buying the $100 Codex plan. So far it has been much more generous with usage and more accurate than Claude for the kind of work I do.

That said, Codex has its own issues. Its personality can be a bit off-putting for my taste. I had to add extra instructions in Agents.md just to make it less snarky. I was annoyed enough that I explicitly told it not to use the word “canonical.”

On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code. Claude used to have much better finesse there. But for backend logic, hard debugging, and complex problem-solving, Codex has been clearly better for me. These days I use Impeccable Skillset inside Codex to compensate for the weaker UI taste, but it still does not quite match the polish and instinct Claude Code used to have.

I used to be a huge Claude Code advocate. At this point, I cannot recommend it in good conscience.

My advice now is simple: try the $20 plans for Codex and Cursor, and see which one matches your workflow and vibes best

cedws 8 hours ago | parent | next [-]

I had a weird experience at work last week where Claude was just thinking forever about tasks and not actually doing anything. It was unusable. The next day it was fine again.

mstank 8 hours ago | parent | next [-]

That happens to me all the time. My current working theory is when their servers are hammered there is a queueing system that invisible to end-users.

jatora 6 hours ago | parent [-]

i was having this issue yesterday. the same prompt would send it into a loop where it would appear to be doing nothing for 30+ minutes until i cancelled it. it would show 400 tokens used and thats it.

I tested on a previous version (2.1.68) and it still ran into this neverending loop BUT at least the token count kept steadily increasing.

So we are seeing 1. some sort of model degredation is my guess (why it can't break a thinking loop on some problems), as well as 2. a clear drop in thinking token UI transparency.

cjonas 8 hours ago | parent | prev | next [-]

Ya I've had this experience more than a few times recently. I've heard people claiming they are serving quantized models during high loads, but it happens in cursor as well so I don't think it's specific to Anthropics subscription. It could be that the context window has just gotten into a state that confuses the model... But that wouldn't explain why it appears to be temporary...

My best guess is this is the result of the companies running "experiments" to test changes. Or it's just all in my head :)

whywhywhywhy 7 hours ago | parent [-]

Cursor one is back to Claude 4 or 3.5+ at best. Struggles to do things it did effortlessly a few weeks ago.

It’s not under load either it’s just fully downgraded. Feels more they’re dialing in what they can get away with but are pushing it very far.

sunaookami 7 hours ago | parent | prev | next [-]

Set MAX_THINKING_TOKENS to 0, Claude's thinking hardly does anything and just wastes tokens. It actually often performs worse than without thinking.

gruez 7 hours ago | parent [-]

Not the guy you're responding to, but when this happens the token counter is frozen at some low value (eg. 1k-10k) value as well, so it's not thinking in circles but rather not thinking (or doing anything, for that matter) at all.

jatora 6 hours ago | parent | next [-]

i was having this issue yesterday. the same prompt would send it into a loop where it would appear to be doing nothing for 30+ minutes until i cancelled it. it would show 400 tokens used and thats it. I tested on a previous version (2.1.68) and it still ran into this neverending loop BUT at least the token count kept steadily increasing.

So we are seeing 1. some sort of model degredation is my guess (why it can't break a thinking loop on some problems), as well as 2. a clear drop in thinking token UI transparency

when i left it running overnight it finally sent a message saying it exceeded the 64000 output token limit

egeozcan 7 hours ago | parent | prev [-]

This exact thing is happening to me since yesterday. It comes back to life when I throw the whole session away.

freedomben 7 hours ago | parent | prev [-]

This happened to me as well! It was especially infuriating because I had just barely upgraded to the $200 per month plan because I exhausted my weekly quota. Then the entire next day was a complete bust because of this issue. I want my money back!

cedws 7 hours ago | parent [-]

What day was it?

freedomben 7 hours ago | parent [-]

Thursday starting mid to late morning, and ended Friday night (US timezone).

cedws 7 hours ago | parent [-]

Same day then. It was happening for me roughly between 9am-5pm BST time.

mixermachine 8 hours ago | parent | prev | next [-]

I'm using the Codex Business subscription (about 30€) already for multiple months. Even there they cut back on the quota. A few months back it was hard for me to reach the limit. Now it is easier.

Still, in comparison with Claude Code, the quota of Codex is a much better deal. However, they should not make it worse...

wheelerwj 8 hours ago | parent | next [-]

I have the exact opposite experience. I can run claude forever, my codex quota was done by Wednesday morning.

throwup238 8 hours ago | parent | prev [-]

OpenAI had a promotion that gave everyone double their rate limits until April 2nd.

virgildotcodes 7 hours ago | parent [-]

Promotion has been extended til May 31st for the $100 and $200 subs.

At the same time, they’ve been giving out a ton of additional quota resets seemingly every other week (and committed to an additional reset for every million additional users until they hit 10mil on codex).

So they’ve really set a high bar for people’s expectations on their quota limits.

Once they drop the 2x promotion for good and stop the frequent resets, there are going to be a lot of complaints.

dataviz1000 7 hours ago | parent | prev | next [-]

> Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.

This is what I'm working on proving now.

It is more that there is a confidence score while thinking. Opus will quit if it is too high and will grind on if the confidence score is close to the real answer. Haiku handles this well too.

If you give Sonnet a hard task, it won't quit when it should.

Nonetheless, that issue has been fixed with Opus.

I'll try to show that the speed of using Opus on tasks that have medium to hard difficultly is consistently the same price or cheaper than running them with Haiku and Sonnet. While easier tasks, the busy work that is known, is cheaper run with Haiku.

onlyrealcuzzo 8 hours ago | parent | prev | next [-]

> Claude has gotten noticeably worse for me too.

My experience is limited only to CC, Gemini-cli, and Codex - not Aider yet, trying different combinations of different models.

But, from my experience, CC puts everything else to shame.

How does Cursor compare? Has anyone found an Aider combination that works as well?

chrismustcode 8 hours ago | parent | next [-]

Is aider even a thing considered anymore?

It was pretty much first for CLI agents and had a benchmark that was the go to at the start of LLM coding. Now the benchmark doesn't get updated and aider never gets a mention in talking about CLI tools till now.

faangguyindia 8 hours ago | parent [-]

Aider is dead because it's pre function calling era of tech

6 hours ago | parent | prev [-]
[deleted]
zozbot234 8 hours ago | parent | prev | next [-]

> It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.

Give it a custom sandbox and context for the work, so it has no opportunity to roam around when not required. AI agentic coding is hugely wasteful of context and tokens in general (compared to generic chat, which is how most people use AI), there's a whole lot of scope for improvement there.

egeozcan 8 hours ago | parent | next [-]

But the problem is it used to not need that before. These days, you have to think twice before you summon a subagent.

lelanthran 13 minutes ago | parent [-]

> But the problem is it used to not need that before. These days, you have to think twice before you summon a subagent.

This is exactly what I (and many others) kept trying to tell the pro-AI folk 18 months ago: there is no value to jumping on the product early because any "experience" you have with it is easily gained by newcomers, and anything you learned can easily be swapped out from under you anyway.

imglorp 8 hours ago | parent | prev [-]

The sandbox is fine, but if the parent has given explicit instruction of files to inspect, why is it not centering there? Is the recent breakage that the base prompt makes it always try to explore for more context even if you try to focus it?

zozbot234 8 hours ago | parent | next [-]

Because the "explicit instruction" you give AI is not deterministic as in a normal computer program. It's a complete black box and the context is also most likely polluted by all sorts of weird stuff. Putting it on as tight of a leash as possible should be seen as normal.

zarzavat 7 hours ago | parent | prev [-]

They changed plan mode so that it's instructed to follow a multi-step plan, the first step being to explore the code base. When you tell it to focus it's getting contradictory instructions from plan mode vs your prompt and it's essentially a coin flip which one it picks.

It does seem like a cynical attempt to make more money.

Rekindle8090 7 hours ago | parent | prev | next [-]

The product was performing badly and you thought this would be solved by spending more money on it?

When will people realize this is the same as vendor lock-in?

"Maybe if I spend more money on the max plan it will be better" > no it will be the same "Maybe if I change my prompt it will work" > no it will be the same "Maybe if I try it via this API instead of that API it will improve" > no it will be the same.

Claude, ChatGPT, Gemini etc all of these SOTA models are carefully trained, with platforms carefully designed to get you to pay more for "better" output, or try different things instead of using a different product.

It's to keep you in the ecosystem and keep you exploring. There is a reason you can't see the layers upon layers of scaffolding they have. And there's a reason why after 2 weeks post major update, the model is suddenly "bad" and "frustrating". It's the same reason its done with A/B testing, so when you complain, someone else has no issues, when they complain, you have no issues. It muddies the water intentionally.

None of it is because you're doing anything wrong, it's not a skill issue, it's a careful strategy to extract as much engagement and money from customers as possible. It's the same reason they give people who buy new gun skins in call of duty easier matches in matchmaking for the first couple games.

The only mistake you made was paying MORE, hoping it would get better. It won't, that's not what makes them money. Making people angry and making people waste their time, while others have no issues, and making them explore and try different things for longer so they can show to investors how long people use these AI tools is what makes them money.

When competitors have a better product these issues go away When a new model is released these issues don't exist

I was paying a ton of money for claude, once I stopped and cancelled my subscription entirely, suddenly sonnet 4.6 is performing like opus and I don't have prompts using 10% of my quota in one message despite being the same complexity.

athorax 7 hours ago | parent [-]

Do you realize Claude and Codex are different products by different companies?

ImPostingOnHN 7 hours ago | parent [-]

You ask that as if there is some insight to the question, but the insight is hard to find. What the person you replied to is saying, applies to both Claude and Codex.

yaur 5 hours ago | parent | prev | next [-]

When they bumped the context size up to 1m tokens they made it much easier to blow through session limits quickly unless you manually compact or keep sessions short.

comboy 7 hours ago | parent | prev | next [-]

Any good reasonable alternatives? Gemini is like prodigious 3yo hopeless for my projects, anybody tested some opencode with kimi or something?

eurekin 7 hours ago | parent [-]

I'm adding two extra gpus to my local rig. Turns out qwen 3.5 122b is already enough to handle (finish with moderate guidance) non-planning parts of my tasks.

zamalek 7 hours ago | parent | prev | next [-]

I am also on Codex while Claude seems to be blatantly ignoring instructions (as recently as Thursday: when I made the switch). The huge Claude context helps with planning, so that's all it does now.

Codex consumes way fewer resources and is much snappier.

varispeed 4 hours ago | parent | prev | next [-]

I wonder if this is in the system prompt: "Go round in circles to make us more money."

jen20 7 hours ago | parent | prev | next [-]

> On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code.

OpenCode is great though, and can (for now) use an OpenAI subscription.

dvfjsdhgfv 7 hours ago | parent | prev | next [-]

By the way, what are you using it for? I bought Max and Pro plans for Claue and Codex, developed a few apps with it, and after the initial excitation ("Wow I can get results 10x faster!") I felt the net sum is negative for me. In the end I didn't learn much except the current quirks of each model/tool, I didn't enjoy the whole process and the end result was not good enough for my standards. In the end I deleted all these projects and unsubscribed.

chandureddyvari 6 hours ago | parent [-]

For me it’s mostly useful in day-to-day coding, not “build an entire app and walk away” coding.

TDD was never really my natural style, but LLMs are great at generating the obvious test cases quickly. That lets me spend more of my attention on the edge cases, the invariants, and the parts that actually need judgment.

Frontend is another area where they help a lot. It’s not my strongest side, so pairing an LLM with shadcn/ui gets me to a decent, responsive UI much faster than I would on my own. Same with deployment and infra glue work across Cloudflare, AWS, Hetzner, and similar platforms.

I’m basically a generalist with stronger instincts in backend work, data modeling, and system design. So the value for me is that I can lean into those strengths and use LLMs to cover more ground in the areas where I’m weaker.

That said, I do think this only works if you’re using them as leverage, not as a substitute for taste or judgment.

stavros 7 hours ago | parent | prev | next [-]

Codex has been better for me, but it's WAY too nitpicky/defensive. It always wants to make changes that add complexity and code to solve a problem that's impossible to happen (e.g. a multiprocess race condition on a daemon I only ever run one instance of).

comboy 7 hours ago | parent | next [-]

You just convinced me to try it. Claude just copy pastes, does search and replace, zero abstractions and I'm the one that needs to think about the edge cases.

stavros 6 hours ago | parent [-]

That's why I have Claude write the code and Codex review.

bdangubic 6 hours ago | parent [-]

that’s like having oleg kiselyov’s code reviewed by my middle school daughter :)

stavros 6 hours ago | parent [-]

I didn't know your middle school daughter is a genius coder, congratulations!

hk__2 6 hours ago | parent | prev [-]

Same here; that’s very annoying because it adds a lot of entropy to the code, and people don’t always take the time to clean things up.

bethekidyouwant 8 hours ago | parent | prev [-]

[flagged]