Remix clone Hacker News

new | show | ask | jobs Github

	▲	criley2 2 days ago
		While it is true that model makers are increasingly trying to game benchmarks, it's also true that benchmark-chasing is lowering model quality. GPT 5, 5.1 and 5.2 have been nearly universally panned by almost every class of user, despite being a benchmark monster. In fact, the more OpenAI tries to benchmark-max, the worse their models seem to get.
	▲	astrange 2 days ago \| parent \| next [-]
		Hm? 5.1 Thinking is much better than 4o or o3. Just don't use the instant model.
	▲	malnourish a day ago \| parent \| prev \| next [-]
		5.2 is a solid model and I'm actually impressed with M365 copilot when using it.
	▲	2 days ago \| parent \| prev [-]
		[deleted]