This puts Sonnet 4.6 above Opus 4.6 in the coding index.. kinda hard to trust those numbers.

(Also it puts Opus 4.7 universally above Opus 4.6, and I may be wrong but this doesn't seem to match the experience of most/many/some people. I think it's widely recognized that Anthropic is severely lacking compute and Opus 4.7 is a costs saving measure)

▲

conception 2 hours ago | parent | next [-]

What I’ve usually seen is 4.7 -> 4.5 -> 4.6 in terms of quality. Though 4.7 seems to hallucinate more than before.

▲

manmal 4 hours ago | parent | prev [-]

Anthropic themselves have (had?) this thing where Opus is used for planning and Sonnet for coding.

	▲	nextaccountic 3 hours ago \| parent [-]
		I thought this was a costs saving measure: we plan with the frontier model / SOTA, then code with something cheaper. But then, Anthropic employees don't have rate limits, right?