why does this even occur? if it's merely compute limitations, why not just 429 some requests?

Have you run a system in production? There are a multitude of reasons that a system can go down. There's no indication so far from Anthropic that this was merely compute limitations.

▲

KronisLV 8 hours ago | parent | next [-]

> There are a multitude of reasons that a system can go down.

Start doing post mortems then!

At the very least, them using any off the shelf service that's shitting the bed would inform others to stay away from it - like an IAM solution, or maybe a particular DB in a specific configuration backing whatever they've written, or a given architecture for a given scale.

Right now it's completely like a black box that sometimes goes down and we don't get much information about why it's so much less stable than other options (hey, if they just came out and said "We're growing 10x faster than we anticipated and system X, Y and Z are not architected for that." that'd also be useful signal).

Or, who knows, maybe it's just bad deploys - seems like it's back for me and claude.ai UI looks a bit different hmmm.

	▲	SpicyLemonZest 7 hours ago \| parent [-]
		I have no inside knowledge of Anthropic. But having done a lot of postmortems in general, one of the key dynamics that routinely comes up is "we know we keep shipping breakages, and we know these new procedures would prevent many of them, but then we wouldn't be able to deliver new stuff so quickly". Given where Anthropic is at and what they believe about the future of software development, that's a tradeoff that they may very well be intentionally not making.

▲

lionkor 8 hours ago | parent | prev | next [-]

Its most likely a "You're totally right, this fix broke production! Let me fix it"

▲

consumer451 8 hours ago | parent | prev [-]

Yeah, this is not just inference. First thing for me was an MCP I use went down in Claude Code, models still worked. Now "API Error: 529 Authentication service is temporarily unavailable."