Remix clone Hacker News

new | show | ask | jobs Github

	▲	thevinter 2 hours ago
		Are you intentionally keeping the benchmarks private?
	▲	XCSme an hour ago \| parent [-]
		Yes. I am trying to think what's the best way to give most information about how the AI models fail, without revealing information that can help them overfit on those specific tests. I am planning to add some extra LLM calls, to summarize the failure reason, without revealing the test.