Anthropic releases used to feel thorough and well done, with the models feeling immaculately polished. It felt like using a premium product, and it never felt like they were racing to keep up with the news cycle, or reply to competitors.

Recently that immaculately polished feel is harder to find. It coincides with the daily releases of CC, Desktop App, unknown/undocumented changes to the various harnesses used in CC/Cowork. I find it an unwelcome shift.

I still think they're the best option on the market, but the delta isn't as high as it was. Sometimes slowing down is the way to move faster.

▲

bcherny 7 hours ago | parent | next [-]

Boris from the Claude Code team here. We agree, and will be spending the next few weeks increasing our investment in polish, quality, and reliability. Please keep the feedback coming.

▲

batshit_beaver 7 hours ago | parent | next [-]

> investment in polish, quality, and reliability

For there to be any trust in the above, the tool needs to behave predictably day to day. It shouldn't be possible to open your laptop and find that Claude suddenly has an IQ 50 points lower than yesterday. I'm not sure how you can achieve predictability while keeping inference costs in check and messing with quantization, prompts, etc on the backend.

Maybe a better approach might be to version both the models and the system prompts, but frequently adjust the pricing of a given combination based on token efficiency, to encourage users to switch to cheaper modes on their own. Let users choose how much they pay for given quality of output though.

▲

pkos98 7 hours ago | parent | prev | next [-]

Sure, I've cancelled my Max 20 subscription because you guys prioritize cutting your costs/increasing token efficiency over model performance. I use expensive frontier labs to get the absolute best performance, else I'd use an Open Source/Chinese one.

Frontier LLMs still suck a lot, you can't afford planned degradation yet.

▲

wilj 6 hours ago | parent | prev | next [-]

My biggest problem with CC as a harness is that I can't trust "Plan" mode. Long running sessions frequently start bypassing plan mode and executing, updating files and stuff, without permission, while still in plan mode. And the only recovery seems to be to quit and reload CC.

Right now my solution is to run CC in tmux and keep a 2nd CC pane with /loop watching the first pane and killing CC if it detects plan mode being bypassed. Burning tokens to work around a bug.

▲

tkgally 4 hours ago | parent | prev | next [-]

Here's one person's feedback. After the release of 4.7, Claude became unusable for me in two ways: frequent API timeouts when using exactly the same prompts in Claude Code that I had run problem-free many times previously, and absurdly slow interface response in Claude Cowork. I found a solution to the first after a few days (add "CLAUDE_STREAM_IDLE_TIMEOUT_MS": "600000" to settings.json), but as of a few hours ago Cowork--which I had thought was fantastic, by the way--was still unusable despite various attempts to fix it with cache clearing and other hacks I found on the web.

▲

a-dub 7 hours ago | parent | prev | next [-]

hm. ml people love static evals and such, but have you considered approaches that typically appear in saas? (slow-rollouts, org/user constrained testing pools with staged rollouts, real-world feedback from actual usage data (where privacy policy permits)?

▲

g4cg54g54 2 hours ago | parent | prev | next [-]

> Please keep the feedback coming

if only there were a place with 9.881 feedbacks waiting to be triaged...

and that maybe not by a duplicate-bot that goes wild and just autocloses everything, just blessing some of the stuff there with a "you´ve been seen" label would go a long way...

	▲	oefrha an hour ago \| parent [-]
		Common pattern of checking the claude code issue tracker for a bug: land on issue #12587, auto closed as duplicate of #12043; check #12043, auto closed as duplicated of #11657; check #11657, auto closed as duplicate of #10645; check #10645, never got a response, or closed as not planned, or some other bullshit.

▲

jpcompartir 6 hours ago | parent | prev | next [-]

Thanks, I have a lot of trust in and admiration for the team & respect for the work you guys have done and continue to do.

▲

szmarczak 7 hours ago | parent | prev | next [-]

Why ban third party wrappers? All of this could've been sidestepped had you not banned them.

▲

ElFitz 7 hours ago | parent [-]

Because then they lose vertical integration and the extra ability it grants to tune settings to reduce costs / token use / response time for subscription users.

Or improve performance and efficiency, if we’re generous and give them the benefit of the doubt.

It makes sense, in a way. It means the subscription deal is something along the lines of fixed / predictable price in exchange for Anthropic controlling usage patterns, scheduling, throttling (quotas consumptions), defaults, and effective workload shape (system prompt, caching) in whatever way best optimises the system for them (or us if, again, we’re feeling generous) / makes the deal sustainable for them.

It’s a trade-off

▲

cmrdporcupine 6 hours ago | parent | next [-]

They gained that ability to tune settings and then promptly used it in a poor way and degraded customer experience.

▲

szmarczak 6 hours ago | parent | prev [-]

Nothing you wrote makes sense. The limits are so Anthropic isn't on a loss. If they can customize Claude using Code, I see no reason why they couldn't do so with other wrappers. Other wrappers can also make use of cache.

If you worry about "degraded" experience, then let people choose. People won't be using other wrappers if they turn out to be bad. People ain't stupid.

▲

ElFitz 5 hours ago | parent [-]

By imposing the use of their harness, they control the system prompt:

> On April 16, we added a system prompt instruction to reduce verbosity. In combination with other prompt changes, it hurt coding quality, and was reverted on April 20. This impacted Sonnet 4.6, Opus 4.6, and Opus 4.7

They can pick the default reasoning effort:

> On March 4, we changed Claude Code's default reasoning effort from high to medium to reduce the very long latency—enough to make the UI appear frozen—some users were seeing in high mode

They can decide what to keep and what to throw out (beyond simple token caching):

> On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour, to reduce latency when users resumed those sessions. A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive. We fixed it on April 10. This affected Sonnet 4.6 and Opus 4.6

It literally is all in the post.

I don't worry about anything though. It's not my product. I don't work for Anthropic, so I really couldn't care less about anyone else's degraded (or not) experience.

	▲	szmarczak 2 hours ago \| parent [-]
		> they control the system prompt They control the default system prompt. You can change it if you want to. > They can pick the default reasoning effort Don't see how it's an obstacle in allowing third party wrappers. > They can decide what to keep and what to throw out That's actually a good point. However I still don't think it's an obstacle. If third party wrappers were bad, people simply wouldn't be using them.

▲

troupo 7 hours ago | parent | prev | next [-]

And you didn't invest anything in polish, quality and reliability before... why? Because for any questions people have you reply something like "I have Claude working on this right now" and have no idea what's happening in the code?

A reminder: your vibe-coded slop required peak 68GB of RAM, and you had to hire actual engineers to fix it.

▲

cmrdporcupine 6 hours ago | parent [-]

I think you're being a bit harsh.

... But then again, many of us are paying out of pocket $100, $200USD a month.

Far more than any other development tools.

Services that cost that much money generally come with expectations.

▲

troupo 5 hours ago | parent [-]

Here's Jared Sumner of bun saying they reduced peak consumption from 68GB to 1.7GB: https://x.com/jarredsumner/status/2026497606575398987 Anthropic had acquired bun just 3 months prior.

A month prior their vibe-coders was unironically telling the world how their TUI wrapper for their own API is a "tiny game engine" as they were (and still are) struggling to output a couple of hundred of characters on screen: https://x.com/trq212/status/2014051501786931427

Meanwhile Boris: "Claude fixes most bugs by itself. " while breaking the most trivial functionality all the time: https://x.com/bcherny/status/2030035457179013235 https://x.com/bcherny/status/2021710137170481431 https://x.com/bcherny/status/2046671919261569477 https://x.com/bcherny/status/2040210209411678369 while claiming they "test carefully": https://x.com/bcherny/status/2024152178273989085

	▲	cmrdporcupine 5 hours ago \| parent [-]
		Yeah you don't have to convince me. I switched to Codex mid-January in part because of the dubious quality of the tui itself and the unreliability of the model. Briefly switched back through March, and yep, still a mistake. Once OpenAI added the $100 plan, it was kind of a no-brainer.

▲

ankaz 6 hours ago | parent | prev [-]

[dead]

▲

KronisLV 7 hours ago | parent | prev | next [-]

> It felt like using a premium product, and it never felt like they were racing to keep up with the news cycle, or reply to competitors.

I don't know, their desktop app felt really laggy and even switching Code sessions took a few seconds of nothing happening. Since the latest redesign, however, it's way better, snappy and just more usable in most respects.

I just think that we notice the negative things that are disruptive more. Even with the desktop app, the remaining flaws jump out: for example, how the Chat / Cowork / Code modes only show the label for the currently selected mode and the others are icons (that aren't very big), a colleague literally didn't notice that those modes are in the desktop app (or at least that that's where you switch to them).

▲

spaniard89277 7 hours ago | parent | prev | next [-]

Given the price I don't really think they're the best option. They're sloppy and competitors are catching up. I'm having same results with other models, and very close with Kimi, which is waaay cheaper.

▲

kilroy123 6 hours ago | parent | prev | next [-]

I agree. It all feels so AI-slopy now.

▲

OtomotO 7 hours ago | parent | prev [-]

I guess it's a bit of desperation to find a sustainable business model.

The AI hype is dying, at least outside the silicon valley bubble which hackernews is very much a part of.

That and all the dogfooding by slop coding their user facing application(s).