Remix.run Logo
PeterStuer 10 hours ago

Looking at the very first test, it seems the system prompt already emphasizeses the success metric above the constraints, and the user prompt mandates success.

The more correct title would be "Frontier models can value clear success metrics over suggested constraints when instructed to do so (50-70%)"