Remix.run Logo
comboy 5 hours ago

Unrelated, but Claude was performing so tragically last few days, maybe week(s), but days mostly, that I had to reluctantly switch. Reluctantly because I enjoy it. Even the most basic stuff, like most python scripts it has to rerun because of some syntax error.

The new reality of coding took away one of the best things for me - that the computer always just does what it is told to do. If the results are wrong it means I'm wrong, I made a bug and I can debug it. Here.. I'm not a hater, it's a powerful tool, but.. it's different.

taspeotis 2 hours ago | parent | next [-]

https://marginlab.ai/trackers/claude-code/

bluegatty 4 hours ago | parent | prev | next [-]

Codex with 5.4 xhigh. It's a bad communicator but does the job.

elAhmo 2 hours ago | parent [-]

You mean codex (client) with GPT 5.4 xhigh? I am using Codex 5.3 (model) through Cursor, waiting for Codex 5.4 model as I had great experience so far with 5.3.

bluegatty an hour ago | parent [-]

yes codex. it has 5.4.

pacha3000 5 hours ago | parent | prev [-]

I'm the first to be tired of everyone, for every model, that says "uuuh became dumber" because I didn't believe them

... until this week! Opus is struggling worse than Sonnet those last two weeks.

saghm 3 hours ago | parent | next [-]

Forget the agent itself being dumber: right now I'm getting an "API error: usage limit exceeded" message whenever I try anything despite my usage showing as 26% for the session limit and 8% for the week (with 0/5 routines, which I guess is what this thread is about). This is with the default model and effort, and Claude Code is saying I need to turn on extra usage for it to work. Forget that, I just canceled my subscription instead.

There's utility in LLMs for coding, but having literally the entire platform vibe-coded is too much for me. At this point, I might genuinely believe they're not intentionally watering anything down, because it's incredibly believable that they just have no clue how any of it works anymore.

girvo 3 hours ago | parent | prev | next [-]

My favourite was, Opus 4.6 last night (to be fair peak IST time, late afternoon my time), the first prompt with a small context: jams a copy-pasted function in between a bunch of import statements, doesn't even wire up it's own function and calls it done. Wild, I've not seen failure states like that since old Sonnet 4

jpcompartir 4 hours ago | parent | prev | next [-]

Likewise, I foolishly assumed everybody else was just doing it wrong.

But this week I've lost count of the times I've had to say something along the lines of: "Can you check our plan/instructions, I'm pretty sure I said we need to do [this thing] but you've done [that thing]..."

And get hit with a "You're absolutely right...", which virtually never happened for me. I think maybe once since Opus 4-6.

comboy 5 hours ago | parent | prev | next [-]

Pretty reassuring to hear that. I was skeptical too, there's a lot of variables like some crap added to memory specific skill or custom instructions interfering with the workflow and what not. But now it was like a toddler that consumes money when talking.

timacles 3 hours ago | parent [-]

It’s quite an interesting business model actually that the worse it performs to a degree the more money it makes you because of the token churn

combyn8tor 3 hours ago | parent | prev | next [-]

In my experience Opus and Claude have declined significantly over the past few weeks. It actually feels like dealing with an employee that has become bored and intentionally cuts corners.

rishabhaiover 43 minutes ago | parent [-]

And the worse part is the company is gaslighting people when they report it

qingcharles 3 hours ago | parent | prev [-]

Is it? Or is it the task you're trying to do? Opus 4.6 has been staggeringly good for me this last week, both inside Claude Code and through Antigravity until I used up my quota.