| ▲ | nopinsight 2 hours ago | |
I assume you're using the "regular" Pro version of Gemini 3.1 for the above, rather than the Deep Think mode, which is more comparable to GPT-5.5 Pro. To my knowledge, regular 3.1 Pro is a tier below and often makes mistakes. Moreover, there's no reason to believe the progress of LLMs, which couldn't reliably solve high-school math problems just 3–4 years ago, will stop anytime soon. You might want to track the progress of these models on the CritPt benchmark, which is built on *unpublished, research-level* physics problems: Frontier models are still nowhere near solving it, but progress has been rapid. * o3 (high) <1.5 years ago was at 1.4% * GPT 5.4 (xhigh), 23.4% * GPT-5.5 (xhigh), 27.1% * GPT-5.5 Pro (xhigh) 30.6%. | ||
| ▲ | civvv 4 minutes ago | parent [-] | |
There are many indications that model progress is slowing down, so that is not entirely accurate. | ||