Remix clone Hacker News

new | show | ask | jobs Github

	▲	trollbridge 4 days ago
		I evaluate how good models are now by how good they are at removing code. It’s fairly simple (assuming the test harness and agents.md are well written): do iterations of trying to remove code, ensure it passes, then have a human review it. Less code to review that way.