Remix.run Logo
nimchimpsky 3 hours ago

barely competitive ? Mythos column is the first column.

You are the only person with this take on hackernews, everyone else "this is a massive a jump". Fwiwi, the data you list shows the biggest jump I remember for mythos

devmor 2 hours ago | parent [-]

The biggest jump in the numbers they quoted is 6%.

Please look at the columns OTHER than Opus as well.

josephg an hour ago | parent [-]

> Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)

> Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5%

> USAMO: 97.6% / 42.3% / 95.2% / 74.4%

> The biggest jump in the numbers they quoted is 6%.

Just in the numbers you quoted, thats a 16.6% jump in terminal-bench and a 55.3% absolute increase in USAMO over their previous Opus 4.6 model.

devmor an hour ago | parent [-]

I don’t know if you’re willingly disregarding everything being said to you or there’s a language barrier here.