| ▲ | joshstrange 2 hours ago | |||||||||||||||||||||||||
I'm not trying to be rude here at all but are you manually verifying any of that? When I've had LLMs write unit tests they are quick to write pointless unit tests that seem impressive "2123/2123 tests passed!" but in reality it's testing mostly nothing of value. And that's when they aren't bypassing commit checks or just commenting out tests or saying "I fixed it all" while multiple tests are broken. Maybe I need a stricter harness but I feel like I did try that and still didn't get good results. | ||||||||||||||||||||||||||
| ▲ | kaydub 44 minutes ago | parent | next [-] | |||||||||||||||||||||||||
I feel like it was doing what you're saying about 4-6 months ago. Especially the commenting out tests. Not always but I'd have to do more things step by step and keep the llm on track. Now though, the last 3-4 months, it's writing decent unit tests without much hand holding or refactors. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | enraged_camel 22 minutes ago | parent | prev [-] | |||||||||||||||||||||||||
>> When I've had LLMs write unit tests they are quick to write pointless unit tests that seem impressive "2123/2123 tests passed!" but in reality it's testing mostly nothing of value. This has not happened to me since Sonnet 4.5. Opus 4.5 is especially robust when it comes to writing tests. I use it daily in multiple projects and verify the test code. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||