Remix.run Logo
starkeeper 7 hours ago

On a similar note I recently deleted a whole bunch of automated tests because if the AI is going to write most of the code then I should test it to make sure it's good! This won't work for all projects, but for my indie games it's a good idea.

rectang 7 hours ago | parent [-]

> I recently deleted a whole bunch of automated tests because if the AI is going to write most of the code then I should test it to make sure it's good!

??

You say you deleted the tests, because you "should test it"? The logic seems inconsistent.

Sanity checking LLM-generated code with LLM-generated automated tests is low-cost and high-yield because LLMs are really good at writing tests.

0xfaded 7 hours ago | parent | next [-]

I think LLMs are really bad at writing tests. In the good old days you invested in your test code to be structured and understandable. Now we all just say "test this thing you just generated".

I shipped a really embarrassing off-by-one error recently because some polygon representations repeat their last vertex as a sentinel (WKT, KML do this). When I checked the "tests", there was a generated test that asserted that a square has 5 vertices.

rectang 6 hours ago | parent [-]

I suppose that my generalization was too broad and that LLMs can be either good or bad at writing tests depending on your workflow and expectations.

I'm closely supervising the LLM, giving it fine-grained instructions — I generally understand the full interface design and most times the whole implementation (though sometimes I skim). When I have the LLM write unit tests for me, it writes essentially what I would have written a couple years ago, except that it tends to be more thorough and add a few more tests I wouldn't have had the patience to write. That saves me quite a bit of time, and the LLM-generated unit tests are probably somewhat better than what I would have written myself.

I won't say that I never see brain-dead mistakes of the "5-vertex square" variety (haha) — by their nature, LLMs tend towards consistency rather than understanding after all. But I've been using Claude Opus exclusively for while and it doesn't tend to make those mistakes nearly as often as I used to see with lower-powered LLMs.

otabdeveloper4 7 hours ago | parent | prev [-]

> ...because LLMs are really good at writing tests.

No, they're absolutely shit at writing tests. Writing tests is mostly about risk and threat analysis, which LLMs can't do.

(This is why LLMs write "tests" that check if inputs are equal to outputs or flip `==` to `!=`, etc.)