Remix.run Logo
deepdarkforest 4 days ago

Wow. Sneaky. They do not even state the rate of impact for the XLA bug afaik, which affected everyone, not just claude code users, very vague. Interesting.

Claude code made almost half a billion so far[1] (>500m in ARR and its like 9 months old) , and 30% of all users have been impacted at least once, just from the first routing bug. Scary stuff.

Their post mortem is basically "evaluations are hard, we relied on vibe checking, now we are going to have even more frequent vibe checking". I believe it was indeed unintentional, but in the future where investor's money wont come down from the skies, serving distilled models will be very tempting. And you can not be liable to any SLA currently, it's just vibes. I wonder how enterprise vendors are going to deal with this going forward, you cannot just degrade quality without client or vendor even being able to really prove it.

[1][https://www.anthropic.com/news/anthropic-raises-series-f-at-...]

extr 4 days ago | parent | next [-]

Is your contention that paying for a service entitles you to zero bugs, ever?

deepdarkforest 4 days ago | parent | next [-]

Of course not! But usually, you can quantify metrics for quality, like uptime, lost transactions, response time, throughput etc. Then you can have accountability, and remediate. Even for other bugs, you can often reproduce and show clearly the impact. But in this case, other than internal benchmarks, you cannot really prove it. There is no accountability yet

_zoltan_ 4 days ago | parent [-]

why would they publish the data you seek? I would not publish it either.

the blog explains what issues they had and how they fixed them. this is good enough.

gabriel666smith 4 days ago | parent | prev | next [-]

We already kind of have a solution for this with SLAs. Humans, being (probably) non-deterministic, also fuck up. An expectation of a level of service is, I think, reasonable. It's not "zero mistakes ever", just as it can't be "zero bugs ever".

We're firmly in the realms of 'this thing is kind of smarter / faster at a task compared to me my employees, so I am contracting it to do that task'.

That doesn't mean 'if it fails, no payment'.

But I think it's too analogous to non-tech-products to hide behind a 'no refunds' policy. It's that good - there are consequences for it, I think.

flutas 4 days ago | parent | prev [-]

If you paid for a streaming service and the HD option only worked for a random subset of users, and not you, would you complain?

It's a material difference in the product, not just "a bug."

dylan604 4 days ago | parent [-]

I'd honestly blame my ISP for traffic shaping my connection as a first assumption, and not immediately blame the streaming platform.

VirusNewbie 4 days ago | parent | prev [-]

They likely don't want to say how much of their inference comes from GCP vs. AWS.