| ▲ | fauigerzigerk 3 hours ago | |
I'm not opposed to AI generated code in principle. I'm just saying that we don't know how much effort was put into making this and we don't know whether it works. The existence of a repository containing hundereds of files, thousands of SLOCs and a folder full of tests tells us less today than it used to. There's one thing in particular that I find quite astonishing sometimes. I don't know about this particular project, but some people use LLMs to generate both the implementation and the test cases. What does that mean? The test cases are supposed to be the formal specification of our requirements. If we do not specify formally what we expect a tool to do, how do we know whether the tool has done what we expected, including in edge cases? | ||
| ▲ | teiferer an hour ago | parent [-] | |
I fully agree with your overall message and sentiment. But let me be nit-picky for a moment. > The test cases are supposed to be the formal specification of our requirements Formal methods folks would strongly disagree with this statement. Tests are informal specifications in the sense that they don't provide a formal (mathematically rigorous) description of the full expected behavior of the system. Instead, they offer a mere glimpse into what we hope the system would do. And that's an important part, which is where your main point stands. The test is what confirms that the thing the LLM built conforms to the cases the human expected to behave in a certain way. That's why the human needs to provide them. (The human could take help of an LLM to write the tests, as in they give an even-more-informal natural language description of what the test should do. But the human then needs to make sure that the test really does that and maybe fill in some gaps.) | ||