| ▲ | benchwright 2 days ago | |
Tends to be a problem. I've tried to mitigate these problems by using either external harnesses (aka GitHub actions that are "fixed" based on known-good) or by using n-number of witness agents (e.g. Kimi/Qwen/whatever <=> Claude/OpenAI/Google). Generally sucks more time and energy (and now token/$). that being said, I still have a "fix the code, not the test" line somewhere in here... | ||