Remix.run Logo
jll29 3 hours ago

> there are many impactful open source programmers who have explicitly stated that they don't want their code used to train these models and licensed their work in a world where LLMs didn't exist. It wasn't their "gift", it was unwillingly taken from them.

There are subtle legal differences between "free open source" licensing and putting things in the public domain. If you use an open source license, you could forbid LLM training (in licensing law, contrary to all other areas of law, anything that is not granted to licensees is forbidden). Then you can take the big guys (MSFT, Meta, OpenAI, Google) to court if you can demonstrate they violated your terms.

If you place your software into the public domain, any use is fair, including ways to exploit the code or its derivatives not invented at the time of release.

Curiosly, doesn't the GPL even imply that if you pre-tain an LLM with GPLed code and use it to generate code (Claude Code etc.) that all generated code -- as derived intellectual property that it clearly is -- must also be open sourced as per GPL terms? (It would seem in the spirit of the licensors.) Haven't seen this raised or discussed anywhere yet.

zahlman 2 hours ago | parent | next [-]

> If you use an open source license, you could forbid LLM training

Established OSS licenses are all from before anyone imagined that LLMs would come into existence, let alone train on and then generate code. Discrimination on purpose is counter to OSI principles (https://opensource.org/osd):

> 6. No Discrimination Against Fields of Endeavor

> The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.

The GPL argument you describe hinges on making the legal case that LLMs produce "derived works". When the output can't be clearly traced to source input (even the system itself doesn't know how) it becomes rather difficult to argue that in court.

singpolyma3 2 hours ago | parent | prev [-]

You pre suppose that output is derive work (not a given) and that training is not fair use (also not a given).

If the courts decide to apply the law as you assume the AI companies are all dead. But they are all betting that's not going to be the case. And since so much of the industry is taking the bet with them... The courts will take that into account