Remix.run Logo
highfrequency 7 hours ago

Did they at least rule out an easy prompt fix? "Stick to the spirit of the problem and don't cheat (eg reverse engineering the test cases or source code)"

pongogogo 40 minutes ago | parent [-]

They note in the paragraph I quoted at the top that prompting has a big impact on behaviour, so yes this would work. I think that's not what METR are interested in though.