Remix clone Hacker News

new | show | ask | jobs Github

	▲	dyauspitr 4 hours ago
		Deepseek v4 is still pretty far behind the frontier models though.
	▲	BobbyJo 3 hours ago \| parent [-]
		It's really hard to tell. Almost all the models have the benchmarks in their training data, which pushes us into the realm of basing model capability rankings on vibes. I think the OSS models tend to do worse on things outside their corpus, but Deepseek specifically has done insanely good work on efficiency and scaling, which is verifiable in a way capabilities are not.