| ▲ | DRMacIver 4 hours ago | |
Ouch. Classic Claude. It does tend to cheat when it gets stuck, and I've had some success with stricter harnesses, reflection prompts and getting it to redo work when it notices it's cheated, but it's definitely not a solved problem. My guess is that you wouldn't have had a better time without PBT here and it would still have either cheated or claimed victory incorrectly, but definitely agreed that PBT can't fully fix the problem, especially if it's PBT that the agent is allowed to modify. I've still anecdotally found that the results are better than without it because even if agents will often cheat when problems are pointed out, they'll definitely cheat if problems aren't pointed out. | ||