Remix clone Hacker News

new | show | ask | jobs Github

	▲	spectraldrift 4 hours ago
		Weird how they only share three hand-picked evals, ignoring the evals where they were left in the dust like ARC-AGI2. This post is so misleading, I don't even know whether to trust the numbers they did share. One is just fraction of a percentage point away from Gemini 3 pro, which is awfully convenient for marketing and easy to hide. Very open, OpenAI.
	▲	XenophileJKO 3 hours ago \| parent [-]
		Not really that weird. This isn't intended to be a "general" model. This is a coding model so they showed the coding evals. The assumption would be relative to GPT5.1, non-coding evals would be likely regress or be similar. Like when advertising the new airliner, most people don't care about how fast it taxis.