The biggest jump in the numbers they quoted is 6%.

Please look at the columns OTHER than Opus as well.

> Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)

> Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5%

> USAMO: 97.6% / 42.3% / 95.2% / 74.4%

> The biggest jump in the numbers they quoted is 6%.

Just in the numbers you quoted, thats a 16.6% jump in terminal-bench and a 55.3% absolute increase in USAMO over their previous Opus 4.6 model.

	▲	devmor an hour ago \| parent [-]
		I don’t know if you’re willingly disregarding everything being said to you or there’s a language barrier here.