Remix.run Logo
scared_together an hour ago

When I look at the commits themselves, most of the ones generated by Claude are testsuite changes, or at least labelled as such.

https://github.com/RsyncProject/rsync/commits/master/

vips7L 30 minutes ago | parent | next [-]

Aren’t LLMs notorious for just making tests pass and not actually testing functionality?

shimman an hour ago | parent | prev [-]

Is that suppose to make this better? IME the most valuable tests are those that test specific regressions. It's the scaffolding we build for ourselves to enable feature development. Remove that scaffolding and you get accidents. Pray to your god of choice these accidents don't cause harm or loss of life.

It should really be considered negligence at this point. Some of this software is extremely valuable, it's how we flourish as humans. Purposely fucking with that should bear some real world consequence. We do the same in every other industry, software is just as important too.

abuob 20 minutes ago | parent [-]

In my perspective, "Analyze code, come up with edge cases and gaps and create unit tests for them" is one of the use-cases where AI was starting to get really good at, so I can see why someone would want to extend their test-suite dramatically using it.

But yes, using AI to then generate code that still causes regressions doesn't quite square with that. Given the huge amount of test-changes I'd still assume good faith by the maintainer; possibly just a bit of overexcitement paired with a dash of too much confidence into the new tools that is now hitting reality.