Remix clone Hacker News

new | show | ask | jobs Github

	▲	mrandish 3 hours ago
		> Yeah, these benchmarks are bogus. It's not just over-fitting to leading benchmarks, there's also too many degrees of freedom in how a model is tested (harness, etc). Until there's standardized documentation enabling independent replication, it's all just benchmarketing .
	▲	fooker 3 hours ago \| parent [-]
		For the current state of AI, the harness is unfortunately part of the secret sauce.