| ▲ | jmalicki 3 hours ago | |||||||
It depends - there are some very very difficult things that can still be easily verifiable! For instance, if you are working on a compiler and have a huge test database of code to compile that all has tests itself, "all sample code must compile and pass tests, ensuring your new optimizer code gets adequate branch coverage in the process" - the underlying task can be very difficult, but you have large amounts of test coverage that have a very good chance at catching errors there. At the very least "LLM code compiles, and is formatted and documented according to lint rules" is pretty basic. If people are saying LLM code doesn't compile, then yes, you are using it very incorrectly, as you're not even beginning to engage the agentic loop at all, as compiling is the simplest step. Sure, a lot of more complex cases require oversight or don't work. But "the code didn't compile" is definitely in "you're holding it wrong" territority, and it's not even subtle. | ||||||||
| ▲ | alecbz 3 hours ago | parent [-] | |||||||
Yeah performance optimization is potentially another good area for LLMs to shine, if you already have a sufficiently comprehensive test suite, because no functionality is changing. But if functionality is changing, you need to be in the loop to, at the very least, review the tests that the LLM outputs. Sometimes that's easier than reviewing the code itself, but other times I think it requires similar levels of context. But honestly I think sane code organization is the bigger hurdle, which is a lot harder to get right without manual oversight. Which of course leads to the temptation to give up on reviewing the code and just trusting whatever the LLM outputs. But I'm skeptical this is a viable approach. LLMs, like human devs, seem to need reasonably well-organized code to be able to work in a codebase, but I think the code they output often falls short of this standard. (But yes agree that getting the LLM to iterate until CI passes is table-stakes.) | ||||||||
| ||||||||