Remix clone Hacker News

new | show | ask | jobs Github

	▲	HardCodedBias 5 hours ago
		If you believe another thread the benchmarks are comparing Gemini-3 (probably thinking) to GPT-5.1 without thinking. The person also claims that with thinking on the gap narrows considerably. We'll probably have 3rd party benchmarks in a couple of days.
	▲	iamdelirium 5 hours ago \| parent [-]
		This is easily shown that the numbers are for GPT 5.1 thinking high. Just go to the leaderboard website and see for yourself: https://arcprize.org/leaderboard