| ▲ | HardCodedBias 5 hours ago | |
If you believe another thread the benchmarks are comparing Gemini-3 (probably thinking) to GPT-5.1 without thinking. The person also claims that with thinking on the gap narrows considerably. We'll probably have 3rd party benchmarks in a couple of days. | ||
| ▲ | iamdelirium 5 hours ago | parent [-] | |
This is easily shown that the numbers are for GPT 5.1 thinking high. Just go to the leaderboard website and see for yourself: https://arcprize.org/leaderboard | ||