Remix clone Hacker News

new | show | ask | jobs Github

	▲	cromka 2 hours ago
		That's exactly why there's a ton of different benchmarking suites used for evaluating hardware performance. I reckon we'll have similar suites comparing different aspects of models. And, at some point, we'll be dealing with models skewing results whenever they detect they're being benchmarked, like it happened before with hardware. Some say that's already happening with the pelican test.