| ▲ | culopatin 18 hours ago | |||||||
But are those tests relevant? I tried using LLMs to write tests at work and whenever I review them I end up asking it “Ok great, passes the test, but is the test relevant? Does it test anything useful?” And I get a “Oh yeah, you’re right, this test is pointless” | ||||||||
| ▲ | manmal 17 hours ago | parent | next [-] | |||||||
Keep track of test coverage and ask it to delete tests without lowering coverage by more than let’s say 0.01 percent points. If you have a script that gives it only the test coverage, and a file with all tests including line number ranges, it is more or less a dumb task it can work on for hours, without actually reading the files (which would fill context too quickly). | ||||||||
| ||||||||
| ▲ | tlarkworthy 10 hours ago | parent | prev | next [-] | |||||||
We fixed this at work by instructing it to maximize coverage with minimal tests, which is closer to our coding style. | ||||||||
| ▲ | elbear 9 hours ago | parent | prev | next [-] | |||||||
Those tests were written by people. That's why they were confident that what the LLM implemented was correct. | ||||||||
| ||||||||
| ▲ | wahnfrieden 18 hours ago | parent | prev [-] | |||||||
Yes Skill issue... And perhaps the wrong model + harness | ||||||||