Remix clone Hacker News

new | show | ask | jobs Github

	▲	PeterStuer 10 hours ago
		Looking at the very first test, it seems the system prompt already emphasizeses the success metric above the constraints, and the user prompt mandates success. The more correct title would be "Frontier models can value clear success metrics over suggested constraints when instructed to do so (50-70%)"