Remix clone Hacker News

new | show | ask | jobs Github

	▲	operatingthetan 5 hours ago
		Would creating new benchmarks every month solve this problem?
	▲	preciousoo 5 hours ago \| parent [-]
		Or create "blind" benchmarks. 10 groups of 3 researchers, all have their own benchmarks that they do not share (testing it without the authors knowing is a different problem, maybe they only run the benchmarks when the gen-pop has access to the models). that's 10 different tests. Aggregate pass rates