Remix.run Logo
wood_spirit 7 hours ago

Me and so many coworkers have been struggling with a big cognitive decline in Claude over the last two months. 4.5 was useful and 4.6 was great. I had my own little benchmark and 4.5 could just about keep track of a two way pointer merge loop whereas 4.6 managed a 3 way and the 1M context managed k-way. And this ability to track braids directly helped it understand real production code and make changes and be useful etc.

but then two months ago 4.6 started getting forgetful and making very dumb decisions and so on. Everyone started comparing notes and realising it wasn’t “just them”. And 4.7 isn’t much better and the last few weeks we keep having to battle the auto level of effort downgrade and so on. So much friction as you think “that was dumb” and have to go check the settings again and see there has been some silent downgrade.

We all miss the early days of 4.6, which just show you can have a good useful model. LLMs can be really powerful but in delivering it to the mass market Anthropic throttle and downgrade it to not useful.

My thinking is that soon deepseek reaches the more-than-good-enough 4.6+ level and everyone can get off the Claude pay-more-for-less trajectory. We don’t need much more than we’ve already had a glimpse of and now know is possible. We just need it in our control and provisioned not metered so we can depend upon it.

hungryhobbit 7 hours ago | parent | next [-]

This was a real issue, and Anthropic recently awknowledged it:

https://www.anthropic.com/engineering/april-23-postmortem

Of course, it sucks when companies screw up ... but at the same time, they "paid everyone back" by removing limits for awhile, and (more importantly to me) they were transparent about the whole thing.

I have a hard time seeing any other major AI provider being this transparent, so while I'm annoyed at Claude ... I respect how they handled it.

swdunlop 6 hours ago | parent | next [-]

Amusingly, when a coworker was looking for this postmortem, they found a different postmortem of three Claude issues that caused decay. This one was in the platform, not in Claude Code:

https://www.anthropic.com/engineering/a-postmortem-of-three-...

I think there's a certain amount of running with scissors going on here. I appreciate the transparency, but the time to remediation here seems pretty long compared to the rate of new features.

aulin an hour ago | parent [-]

I've been using Opus 4.6 from GitHub Copilot and it's most definitely not limited to Claude Code. And it's not fixed.

wood_spirit 7 hours ago | parent | prev [-]

Yes that was one issue. It’s not the general degradation I have been talking about though, which is ongoing.

I recall reading similar tales of woe with other providers here on HN. I think the gradual dialling back of capability as capacity becomes strained as users pile on is part of the MO of all the big AI companies.

felixgallo 7 hours ago | parent [-]

the 'general degradation' is a myth. Check out https://isitnerfed.org/.

mceachen 4 hours ago | parent [-]

Random crowd anecdata is still anecdata.

isoprophlex 7 hours ago | parent | prev [-]

did you set your 4.7 to xhigh or max effort? anything else is basically not worth your time...

Flavius 4 hours ago | parent [-]

Why would I set 4.7 to xhigh or max when the original 4.6 was doing just fine with medium and high?

an hour ago | parent [-]
[deleted]