Remix.run Logo
p1necone 15 hours ago

I've used llms to help me write large sets of test cases, but it requires a lot of iteration and the mistakes it makes are both very common and insidious.

Stuff like reimplementing large amounts of the code inside the tests because testing the actual code is "too hard", spending inordinate amounts of time covering every single edge case on some tiny bit of input processing unrelated to the main business logic, mocking out the code under test, changing failing tests to match obviously incorrect behavior... basically all the mistakes you expect to see totally green devs who don't understand the purpose of tests making.

It saves a shitload of time setting up all the scaffolding and whatnot, but unless they very carefully reviewed and either manually edited or iterated a lot with the LLM I would be almost certain the tests were garbage given my experiences.

(This is with fairly current models too btw - mostly sonnet 4 and 4.5, also in fairness to the LLM a shocking proportion of tests written by real people that I've read are also unhelpful garbage, I can't imagine the training data is of great quality)