Remix.run Logo
nerevarthelame 5 hours ago

It's interesting they only included 6 metrics this time. Opus 4.7 had 12, and 4.6 had 13.

Of the metircs they reported for 4.7, for 4.8 they excluded BrowseComp, CharXiv Reasoning, CyberGym, GPQA Diamond, MCP Atlas, MMMLU, SWE-bench Verified. The last 4 were almost always mentioned in previous Opus releases.

onlyrealcuzzo 5 hours ago | parent [-]

Gonna assume it's because they barely budged or moved downward and most of their reported benchmark results are probably within sampling errors...

hyperpape 5 hours ago | parent [-]

They will release a system card, and you can then confirm or disconfirm your assumptions.