| ▲ | ninjagoo 2 hours ago | ||||||||||||||||
> We're not reading the same numbers I think. We must not be. That's why I listed out the ones where it is barely competitive from @babelfish's table, which itself is extracted from Pg 186 & 187 of the System Card, which has the comparison with Opus 4.6, GPT 5.4 and Gemini 3.1 Pro. Sure, it may be better than Opus 4.6 on some of those, but barely achieves a small increase over GPT-5.4 on the ones I called out. | |||||||||||||||||
| ▲ | nimchimpsky an hour ago | parent [-] | ||||||||||||||||
barely competitive ? Mythos column is the first column. You are the only person with this take on hackernews, everyone else "this is a massive a jump". Fwiwi, the data you list shows the biggest jump I remember for mythos | |||||||||||||||||
| |||||||||||||||||