It's not free. There is a license attached. One you are supposed to follow and not doing so is against the law.

There's a deeper discussion here about property rights, about shrinkwrap licensing, about the difference between "learning from" vs "copying", about the realpolitik of software licensing agreements, about how, if you actually wanted to protect your intellectual property (stated preference), you might be expected to make your software proprietary and not deliberately distribute instructions on how to reproduce an exact replica of it in order to benefit from the network effects of open distribution (revealed preference) - about wanting to have your cake and eat it too, but I'd be remiss to not point out that your username is not doing your credibility any favors here.

▲

sirwhinesalot 3 hours ago | parent | next [-]

I'm not whining in this case, just pointing out "they gave it out for free" is completely false, at the very least for the GNU types. It was always meant to come with plenty of strings attached, and when those strings were dodged new strings were added (GPL3, AGPL).

If I had a photographic memory and I used it to replicate parts of GPLed software verbatim while erasing the license, I could not excuse it in court that I simply "learned from" the examples.

Some companies outright bar their employees from reading GPLed code because they see it as too high of a liability. But if a computer does it, then suddenly it is a-ok. Apparently according to the courts too.

If you're going to allow copyright laundering, at least allow it for both humans and computers. It's only fair.

▲

shkkmo 2 hours ago | parent [-]

> If I had a photographic memory and I used it to replicate parts of GPLed software verbatim while erasing the license, I could not excuse it in court that I simply "learned from" the examples.

Right, because you would have done more than learning, you would have then gone past learning and used that learning to reproduce the work.

It works exactly the same for a LLM. Training the model on content you have legal access to is fine. Aftwards, somone using that model to produce a replica of that content is engaged in copyright enfringement.

You seem set on conflating the act of learning with the act of reproduction. You are allowed to learn from copyrighted works you have legal access to, you just aren't allowed to duplicate those works.

▲

sirwhinesalot 2 hours ago | parent | next [-]

The problem is that it's not the user of the LLM doing the reproduction, the LLM provider is. The tokens the LLM is spitting out are coming from the LLM provider. It is the provider that is reproducing the code.

If someone hires me to write some code, and I give them GPLed code (without telling them it is GPLed), I'm the one who broke the license, not them.

	▲	shkkmo 2 hours ago \| parent [-]
		> The problem is that it's not the user of the LLM doing the reproduction, the LLM provider is. I don't think this is legally true. The law isn't fully settled here, but things seem to be moving towards the LLM user being the holder of the copyright of any work produced by that user prompting the LLM. It seems like this would also place the enfringement onus on the user, not the provider. > If someone hires me to write some code, and I give them GPLed code (without telling them it is GPLed), I'm the one who broke the license, not them. If you produce code using a LLM, you (probably) own the copyright. If that code is already GPL'd, you would be the one engaged in enfringement.

▲

zephen 2 hours ago | parent | prev [-]

You seem set on conflating "training" an LLM with "learning" by a human.

LLMs don't "learn" but they _do_ in some cases, faithfully regurgitate what they have been trained on.

Legally, we call that "making a copy."

But don't take my word for it. There are plenty of lawsuits for you to follow on this subject.

▲

shkkmo 2 hours ago | parent [-]

> You seem set on conflating "training" an LLM with "learning" by a human.

"Learning" is an established word for this, happy to stick with "training" if that helps your comprehension.

> LLMs don't "learn" but they _do_ in some cases, faithfully regurgitate what they have been trained on.

> Legally, we call that "making a copy."

Yes, when you use a LLM to make a copy .. that is making a copy.

When you train a LLM... That isn't making a copy, that is training. No copy is created until output is generated that contains a copy.

	▲	zephen an hour ago \| parent [-]
		> Learning" is an established word for this Only by people attempting to muddy the waters. > happy to stick with "training" if that helps your comprehension. And supercilious dickheads (though that is often redundant). > No copy is created until output is generated that contains a copy. The copy exists, albeit not in human-discernable form, inside the LLM, else it could not be generated on demand. Despite you claiming that "It works exactly the same for a LLM," no, it doesn't.

▲

michaelsshaw 3 hours ago | parent | prev [-]

We spread free software for multiple purposes, one of them being the free software ethos. People using that for training proprietary models is antithetical to such ideas.

It's also an interesting double standard, wherein if I were to steal OpenAI's models, no AI worshippers would have any issue condemning my action, but when a large company clearly violates the license terms of free software, you give them a pass.

▲

ronsor 3 hours ago | parent | next [-]

> I were to steal OpenAI's models, no AI worshippers would have any issue condemning my action

If GPT-5 were "open sourced", I don't think the vast majority of AI users would seriously object.

	▲	sirwhinesalot 3 hours ago \| parent [-]
		OpenAI got really pissy about DeepSeek using other LLMs to train though. Which is funny since that's a much clearer case of "learning from" than outright compressing all open source code into a giant pile of weights by learning a low-dimensional probability distribution of token sequences.

▲

anonym29 an hour ago | parent | prev [-]

I can't speak for anyone else, but if you were to leak weights for OpenAI's frontier models, I'd offer to hug you and donate money to you.

Information wants to be free.