| ▲ | simonw 9 hours ago | |
"because I was diligent about test coverage, sonnet 4.5 perfectly converted the entire parser to tree-sitter for me. all tests passed." I often suspect that people who complain about getting poor results from agents haven't yet started treating automated tests as a hard requirement for working with them. If you don't have substantial test coverage your coding agents are effectively flying blind. If you DO have good test coverage prompts like "port this parser to tree-sitter" become surprisingly effective. | ||
| ▲ | l9o 9 hours ago | parent [-] | |
yes, completely agree. having some sort of guardrails for the LLM is extremely important. in the earlier models I would sometimes write tests for checking that my coding patterns were being followed correctly. basic things like certain files/subclasses being in the correct directories, making sure certain dunder methods weren't being implemented in certain classes where I noticed models had a tendency to add them, etc. these were all things that I'd notice the models would often get wrong and would typically be more of a lint warning in a more polished codebase. while a bit annoying to setup, it would vastly improve the speed and success rate at which the models would be able to solve tasks for me. nowadays many of those don't seem to be as necessary. it's impressive to see how the models are evolving. | ||