▲ | CuriouslyC 3 days ago | |
I've never had a LLM try to run a formatter on my code with probably a few thousand hours logged driving agents (driving 4+ agents at once in most of those). Fuzzing makes the semantics slightly less immediately obvious, but LLMs are more robust to this than you or I, the biggest difference is the reduction in memorization carryover. If it feels like too different of a test for you, not sure what to tell you, but I know the world would appreciate a better way to test for training set contamination if you can figure one out. |