| ▲ | highfrequency 7 hours ago | |
Did they at least rule out an easy prompt fix? "Stick to the spirit of the problem and don't cheat (eg reverse engineering the test cases or source code)" | ||
| ▲ | pongogogo 40 minutes ago | parent [-] | |
They note in the paragraph I quoted at the top that prompting has a big impact on behaviour, so yes this would work. I think that's not what METR are interested in though. | ||