| ▲ | devmor 2 hours ago | |||||||
The biggest jump in the numbers they quoted is 6%. Please look at the columns OTHER than Opus as well. | ||||||||
| ▲ | josephg an hour ago | parent [-] | |||||||
> Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro) > Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5% > USAMO: 97.6% / 42.3% / 95.2% / 74.4% > The biggest jump in the numbers they quoted is 6%. Just in the numbers you quoted, thats a 16.6% jump in terminal-bench and a 55.3% absolute increase in USAMO over their previous Opus 4.6 model. | ||||||||
| ||||||||