Remix.run Logo
palcu 2 days ago

Hey folks, I'm Alex from the reliability engineering team at Anthropic. We've just posted the retrospective for this incident:

> On March 26–27, 2026, customers experienced elevated error rates when using Claude Opus 4.6 and Claude Sonnet 4.6. The issue was caused by a networking performance degradation within our cloud infrastructure that disrupted communication between components of our serving stack. We resolved the incident by migrating the affected workloads to healthy infrastructure, restoring normal service by 9:30 AM PT on March 27.

https://status.claude.com/incidents/b9802k1zb5l2

halJordan 2 days ago | parent | next [-]

Is it really an answer to say "network disruption" with a bunch of $10 words? Certainly it doesn't belong here of all places.

nerdsniper 2 days ago | parent [-]

It’s definitely an answer! Maybe just not a “retrospective”?

cedws 2 days ago | parent | prev | next [-]

Are you able to share if there's a general trend behind the outages? Do you often hit capacity, or do you budget to have headroom?

palcu a day ago | parent [-]

Yes, the general trend is the unprecedented growth that we've seen. Typically one would have some time in advance to re-engineer the systems to support the increased in traffic and users. But we're dealing with very compressed timelines and while most of the time we're able to fix the issues beforehand, sometimes we have to do them in production. Sorry for that.

2 days ago | parent | prev [-]
[deleted]