What makes it particularly tricky to evaluate is that there could still be other bugs given how long these went without even acknowledgement until now, and they did state they are still looking into potential Opus issues.

I'll probably come back and try a Claude Code subscription again, but I'm good for the time being with the alternative I found. I also kind of suspect the subscription model isn't going to work for me long term and instead the pay per use approach (possibly with reserved time like we have for cloud compute) where I can swap models with low friction is far more appealing.

▲

data-ottawa 5 days ago | parent [-]

Benchmarks are too expensive for ordinary users to run, but it would be useful if they could publish their benchmarks using prod over time, that would expose degradations in a more objective manner.

Of course there’s always the problem of teaching to the test and out of test degradations, but presumably bugs would be independent of that.

	▲	rapind 5 days ago \| parent [-]
		A few weeks ago reddit was on fire with outages and timeouts and yet the Anthropic Jira status page was showing everything as green. So even if they had benchmarks, I'm not sure they'd be transparent with them.