| ▲ | TacticalCoder 2 hours ago | |||||||||||||||||||||||||
> Given that for a number of these benchmarks, it seems to be barely competitive with the previous gen We're not reading the same numbers I think. Compared to Opus 4.6, it's a big jump nearly in every single bench GP posted. They're "only" catching up to Google's Gemini on GPQA and MMMLU but they're still beating their own Opus 4.6 results on these two. This sounds like a much better model than Opus 4.6. | ||||||||||||||||||||||||||
| ▲ | ninjagoo 2 hours ago | parent [-] | |||||||||||||||||||||||||
> We're not reading the same numbers I think. We must not be. That's why I listed out the ones where it is barely competitive from @babelfish's table, which itself is extracted from Pg 186 & 187 of the System Card, which has the comparison with Opus 4.6, GPT 5.4 and Gemini 3.1 Pro. Sure, it may be better than Opus 4.6 on some of those, but barely achieves a small increase over GPT-5.4 on the ones I called out. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||