Remix clone Hacker News

new | show | ask | jobs Github

	▲	gdiamos 4 days ago
		How can the community tell if models overfit to these benchmarks?
	▲	kovezd 4 days ago \| parent [-]
		By the composition of evals. Plus secondary metrics like parameter size, and token cost. Not perfect, but useful.