Remix.run Logo
trunnell 5 days ago

https://status.anthropic.com/incidents/72f99lh1cj2c

They recently resolved two bugs affecting model quality, one of which was in production Aug 5-Sep 4. They also wrote:

  Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs. 
Sibling comments are claiming the opposite, attributing malice where the company itself says it was a screw up. Perhaps we should take Anthropic at its word, and also recognize that model performance will follow a probability distribution even for similar tasks, even without bugs making thing worse.
kiratp 5 days ago | parent | next [-]

> Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs.

Things they could do that would not technically contradict that:

- Quantize KV cache

- Data aware model quantization where their own evals will show "equivalent perf" but the overall model quality suffers.

Simple fact is that it takes longer to deploy physical compute but somehow they are able to serve more and more inference from a slowly growing pool of hardware. Something has to give...

cj 5 days ago | parent [-]

> Something has to give...

Is training compute interchangeable with inference compute or does training vs. inference have significantly different hardware requirements?

If training and inference hardware is pooled together, I could imagine a model where training simply fills in any unused compute at any given time (?)

kiratp 5 days ago | parent [-]

Hardware can be the same but scheduling is a whole different beast.

Also, if you pull too manny resources from training your next model to make inference revenue today, you’ll fall behind in the larger race.

mh- 5 days ago | parent | prev | next [-]

The problem is twofold:

- They're reporting that only impacted Haiku 3.5 and Sonnet 4. I used neither model during the time period I'm concerned with.

- It took them a month to publicly acknowledge that issue, so now we lack confidence there isn't another underlying issue going undetected (or undisclosed, less charitably) that affects Opus.

trunnell 5 days ago | parent | next [-]

now we lack confidence there isn't another underlying issue

You can be confident there is a non-zero rate of errors and defects in any complex service that's moving as fast as the frontier model providers!

mh- 5 days ago | parent [-]

Of course. Totally agree, and that's why (I think) I'm being as charitable as possible in this thread.

criemen 5 days ago | parent | prev [-]

They posted

> We are continuing to monitor for any ongoing quality issues, including reports of degradation for Claude Opus 4.1.

I take that as acknowledgment that there might be an issue with Opus 4.1 (granted, undetected still), but not undisclosed, and they're actively looking for it? I'd not jump to "they must be hiding things" yet. They're building, deploying and scaling their service at incredible pace, they, as we all, are bound to get some things wrong.

mh- 5 days ago | parent [-]

To be clear, I'm not one of the people suggesting they're doing something nefarious. As I said elsewhere, I don't know what my expectations are of them at this point. I'd like early disclosure of known performance drops, I guess. But from a business POV, I understand why they're not going to be updating a status page to say "things are worsening but we're not exactly sure why".

I'm also a realist, though, and have built a career on building/operating large systems. There's obviously capability to dynamically shed load built into the system somewhere, there's just no other responsible way to engineer it. I'd prefer they slowed response times rather than harmed response quality, personally.

claude_ya_ 5 days ago | parent | prev [-]

Does anyone know if this also affected Claude Sonnet models running in AWS Bedrock, or if it was just when using the model via Anthropic’s API?