▲ | ants_everywhere 2 days ago | |
IMO by far the best improvement would be to make it easier for the agent to force the agent to use a success criterion. Right now it's not easy prompting claude code (for example) to keep fixing until a test suite passes. It always does some fixed amount of work until it feels it's most of the way there and stops. So I have to babysit to keep telling it that yes I really mean for it to make the tests pass. |