| ▲ | 9rx 15 hours ago | |
AI has become very good at writing pointless and bad tests, at least. It remains difficult to compel it to write good tests consistently. But even if it wrote great tests every time, the trouble is that testing was designed around the idea of "double entry accounting". Even great tests can test the wrong thing. In the old world you would write a test case and then implement something to satisfy the same. If both sides of the ledger agree, so to speak, you can be pretty confident that both are correct. — In other words, going through the process of implementation gives an opportunity to make sure the test you wrote isn't ill-conceived or broken itself. If you only write the tests, or only write the implementation, or write none of it, there is no point at which you can validate your work. If you have already built up an application and are reusing its test suite to reimplement the software in another language, like above, that is one thing, but in greenfield work it remains an outstanding problem of how to validate the work when you start to involve AI agents. Another article posted here recently suggests that we can go back to manual testing to validate the work... But that seems like a non-solution. | ||
| ▲ | visarga 8 hours ago | parent [-] | |
Every error is a signal you need better tests. You can let the LLM create tests for every error it stumbles into, besides all the regular tests it can write on its own. Add all test scenarios you can think of, since you are not implementing them by hand. A bad test is invalidated by code, and a bad code invalidated by the test, so between them the AI agent can become reliable. | ||