Remix clone Hacker News

new | show | ask | jobs Github

	▲	jmalicki 2 hours ago
		I think the point of the paper is to prod benchmark authors to at least try to make them a little more secure and hard to hack... Especially as AI is getting smart enough to unintentionally hack the evaluation environments itself, when that is not the authors intent.