Remix clone Hacker News

new | show | ask | jobs Github

	▲	JB_5000 2 hours ago
		Interesting benchmark, but worth noting the methodology: skills are generated before the task, with no feedback loop. In practice, useful skills tend to emerge from doing — you attempt, observe what failed, then codify what worked. Generate → execute → observe → refine. The paper tests cold generation, which is a different (and less realistic) setup.