Remix clone Hacker News

new | show | ask | jobs Github

	▲	girvo 3 hours ago
		Flawed? Possibly, but I think it's more that any kind of benchmark then becomes a target, and is inherently going to be a "lossy" signal as to the models actual ability in practice. Quantisation doesn't help, but even running full fat versions of these models through various cloud providers, they still don't match Sonnet in actual agentic coding uses: at least in my experience.