| ▲ | pessimizer 2 hours ago | ||||||||||||||||
I might be crazy, and I'd love to hear from somebody who knows about this, but I've been assuming that AI companies have been pulling GPL code out of the training material specifically to avoid this. Corporations have always talked about the virality of GPL, sometimes but not always to the point of exaggeration, you'd think that after getting the proof of concept done the AI companies would be running away at full speed from setting a bomb like that in their goldmine. Putting in tons of commonly read books and scientific papers is safer, they can just eventually cross-license with the massive conglomerates that own everything. But the GPL is by nature hostile, and has been openly and specifically hostile from the beginning. MIT and Apache, etc. you can just include a fistful of licenses to download, or even come up with architectures that track names to add for attribution-ware. But the GPL will obviously (and legitimately) claim to have relicensed the entire model and maybe all its output (unless they restricted it to LGPL.) Wouldn't you just pull it out? | |||||||||||||||||
| ▲ | NateEag 2 hours ago | parent | next [-] | ||||||||||||||||
If you were a thoughtful, careful, law-abiding business, yes. I submit the evidence suggests the genAI companies have none of those attributes. | |||||||||||||||||
| ▲ | NiloCK 2 hours ago | parent | prev | next [-] | ||||||||||||||||
Not crazy - there's a rational self-interest in doing this. But I'm not certain that the relevant players have the same consequence-fearing mindset that you do, and to be honest they're probably right. The theft is too great to calculate the consequences, and by the time it's settled, what are you gonna do - turn off Forster's machine? I hope you're right in at least some cases! | |||||||||||||||||
| |||||||||||||||||
| ▲ | exasperaited 2 hours ago | parent | prev [-] | ||||||||||||||||
> I might be crazy, and I'd love to hear from somebody who knows about this, but I've been assuming that AI companies have been pulling GPL code out of the training material specifically to avoid this. Haha no. https://windsurf.com/blog/copilot-trains-on-gpl-codeium-does... And just in the last two days, AI generating LGPL headers (which it could not do if identifying LGPL code was pulled from the codebase) and misattributing authors: https://devclass.com/2025/11/27/ocaml-maintainers-reject-mas... | |||||||||||||||||