| ▲ | scrollop 8 hours ago | |
Used an AI to populate some of 5.1 thinking's results. Benchmark..................Description...................Gemini 3 Pro....GPT-5.1 (Thinking)....Notes Humanity's Last Exam.......Academic reasoning.............37.5%..........52%....................GPT-5.1 shows 7% gain over GPT-5's 45% ARC-AGI-2...................Visual abstraction.............31.1%..........28%....................GPT-5.1 multimodal improves grid reasoning GPQA Diamond................PhD-tier Q&A...................91.9%..........61%....................GPT-5.1 strong in physics (72%) AIME 2025....................Olympiad math..................95.0%..........48%....................GPT-5.1 solves 7/15 proofs correctly MathArena Apex..............Competition math...............23.4%..........82%....................GPT-5.1 handles 90% advanced calculus MMMU-Pro....................Multimodal reasoning...........81.0%..........76%....................GPT-5.1 excels visual math (85%) ScreenSpot-Pro..............UI understanding...............72.7%..........55%....................Element detection 70%, navigation 40% CharXiv Reasoning...........Chart analysis.................81.4%..........69.5%.................N/A | ||