Remix.run Logo
alganet 7 hours ago

It's funny because I'm evaluating LLMs for just this specific case (covering tests) right now, and it does that a lot.

I say "we need 100% coverage on that critical file". It runs for a while, tries to cover it, fails, then stops and say "Success! We covered 60% of the file (the rest is too hard). I added a comment.". 60% was the previous coverage before the LLM ran.