Remix clone Hacker News

new | show | ask | jobs Github

	▲	nopinsight 2 hours ago
		I assume you're using the "regular" Pro version of Gemini 3.1 for the above, rather than the Deep Think mode, which is more comparable to GPT-5.5 Pro. To my knowledge, regular 3.1 Pro is a tier below and often makes mistakes. Moreover, there's no reason to believe the progress of LLMs, which couldn't reliably solve high-school math problems just 3–4 years ago, will stop anytime soon. You might want to track the progress of these models on the CritPt benchmark, which is built on unpublished, research-level physics problems: https://critpt.com/ Frontier models are still nowhere near solving it, but progress has been rapid. * o3 (high) <1.5 years ago was at 1.4% * GPT 5.4 (xhigh), 23.4% * GPT-5.5 (xhigh), 27.1% * GPT-5.5 Pro (xhigh) 30.6%. https://artificialanalysis.ai/evaluations/critpt.
	▲	civvv 4 minutes ago \| parent [-]
		There are many indications that model progress is slowing down, so that is not entirely accurate.