Remix.run Logo
voidUpdate 10 hours ago

I think the main functional reason is that because a human hasn't written the code, its potentially more likely to have subtle hidden bugs that a human cant explain because they didn't write it, as well as large pull requests that have to be validated by a human when smaller human written ones would be better. But I think it's generally the non-functional reasons that projects are rejecting LLM-generated code. Some developers just find LLM generated code icky, and would prefer not to be associated with it

drdaeman 9 hours ago | parent | next [-]

This makes sense, but I'm not sure it's directed at the actual issue.

There are probably some subtle bugs I can't explain in the code I wrote all by myself. I sure had a few "what was I possibly thinking when I wrote this" moments working on some old code - and that's only the bits I know about. And I sure had countless times people pointing out "hey, you got this stinky here" in a code review (which is the whole point of it). Attention lapses and brain farts sure happen. Slop can be more frequent with LLMs but it's certainly not a LLM-specific issue. They're very productive, there's a literal outbreak, and by the sheer volume shadow any The Daily WTF stories.

However, I can agree that LLM-generated code most likely has higher probability of slop. But then, a policy "a human contributor MUST fully know and understand all the contents of the submitted work, in fine detail, all the way down to every single line of contributed code and documentation" would probably address that in a more functional manner. And then the code can be from an LLM or monkeys with typewriters author had seen in his sleep. That stops to matter because author takes ownership and responsibility: "here's a recognized rational agent who swears by their work". Makes non-self-authored code require a lot more effort (unless it's a trivial change for obvious reason), but arguably even more robust than self-authored code.

That is, unless the PR authors tend lie about their knowledge - but that'll be a whole different story, where LLMs will be just a background detail.

(I'm not saying Godot should be done something different - their project, their rules, let's use that as an opportunity to watch how it goes. Just musing on the matter in general, if there's any rationally explainable merit in such policy.)

mschuster91 9 hours ago | parent | prev [-]

And on top of that - no matter if you develop open-source or proprietary software, who is to guarantee the AI didn't get trained with GPL (or even worse, leaked proprietary) source code? Who is going to pay my lawyer when someone files a copyright lawsuit and all I have as an excuse is that I "AI-laundered" my code?

And some projects like WINE or ReactOS probably have to worry about that even more given they need to guarantee clean-room reverse engineering...

voidUpdate 9 hours ago | parent | next [-]

Given the amount of web scraping LLM providers have been doing, I'd say it's likely that any code that is publicly accessible on the internet has been incorporated into it's training data, whatever its license

cyclotron3k 8 hours ago | parent | prev [-]

I'm not disagreeing with you, but it's worth noting there were plenty of GPL violations before LLMs existed.

mschuster91 8 hours ago | parent [-]

Sure, but at least when I write code by hand I can be fairly certain that I do not copy code without the appropriate license to do so.

cyclotron3k 2 hours ago | parent [-]

From godot's pov though, banning AI won't guarantee some arbitrary contributor's PR doesn't include GPL code.