Remix clone Hacker News

new | show | ask | jobs Github

	▲	unchar1 2 hours ago
		It's not just figuring out if a model is good at things, but is it good at the things I care about. Using a targeted eval suite (like a test suite) tells us that.