Remix.run Logo
input_sh 3 days ago

You seem like the type of person that will believe anything as long as someone cites a case without looking into it. Bartz v Anthropic only looked at books, and there was still a 1.5 billion settlement that Anthropic paid out because it got those books from LibGen / Anna's Archive, and the ruling also said that the data has to be acquired "legitimately".

Whether data acquired from a licence that specifically forbids building a derivative work without also releasing that derivative under the same licence counts as a legitimate data gathering operation is anyone's guess, as those specific circumstances are about as far from that prior case as they can be.

eru 3 days ago | parent | next [-]

As long as they don't distribute the model's weights, even a strict interpretation of the GPL should be fine. Same reason Google doesn't have to upstream changes to the Linux kernel they only deploy in-house.

oblio 2 days ago | parent | next [-]

But LLMs do distribute the derived code they generate outside of their company. That's their entire point.

Akronymus 2 days ago | parent [-]

But wouldn't that be like some company using gpl licensed code to host a code generator for something? At least in a legal interpretation. Or is that different?

oblio 2 days ago | parent | next [-]

And why would that be different or allowed? Sure, you get all the code you want, GPL licensed.

Everybody is trying to have their cake and eat it, too, by license laundering.

Heck, money laundering means you at least lose some of the money.

Akronymus 2 days ago | parent [-]

I have no idea. I genuinly was asking out of curiosity on what the law actually means for that while speculating.

advael 2 days ago | parent | prev [-]

I mean, is the case you're making that you can run a SaaS business on GPL-derived code without fulfilling GPL obligations because you're not distributing a binary?

eru 2 days ago | parent | next [-]

Yes, that's exactly what people do and did. That 'loophole' is the whole reason people came up with https://en.wikipedia.org/wiki/GNU_Affero_General_Public_Lice...

Akronymus 2 days ago | parent | prev [-]

I guess I am. I genuinly am just a layperson trying to look at what the law would say, so everything is speculation.

advael 2 days ago | parent [-]

If true that would seem to invalidate the entire GPL, but even by that logic, a website (such as chatGPT) distributes javascript that runs the code, and programs like claude code also do so. Again, if you can slip the GPL's requirements through indirection like having your application go phone home to your server to go get the infringing parts, the GPL would essentially unenforceable in... most contexts

fragmede 2 days ago | parent [-]

That's where the AGPL comes in. The GPL(v2) does not require eg Google or Facebook to release any of the changes they've made to the Linux kernel. That they do so is not because of a legal obligation to do so. The "to get parts" thing is the relevant detail to be very specific on. If those parts are a binary that is used, then the GPL does kick in, but for distributing source code that's possibly derived, possibly not covered by copyright, it's not been decided in a court of law yet.

fsflover 2 days ago | parent | prev [-]

How about AGPL?

eru 2 days ago | parent [-]

Sure, that one was specifically designed to close that loophole.

ronsor 2 days ago | parent | prev [-]

Have you actually read the text of the GPL?

> This License acknowledges your rights of fair use or other equivalent, as provided by copyright law.

It is legitimate to acquire GPL software. The requirements of the license only occur if you're distributing the work AND fair use does not apply.

Training certainly doesn't count as distribution, so the buck passes to inference, which leaves us dealing with substantial similarity test, and still, fair use.

apatheticonion 2 days ago | parent | next [-]

There is the clean room problem though.

If a human reads GPL code and outputs a recreation of that code (derivative work) using what they learned - that is illegal.

If an AI reads GPL code and outputs a recreation of that code using what it "learned" - it's not illegal?

If that is the case, then copyright holds no weight any more. I should be allowed to train an LLM on decompiled firmware (say, Playstation, Switch, iPhone) in countries where decompilation is legal - then have the LLM produce equivalent firmware that I later use to build an emulator (or competing open source firmware).

tpmoney 2 days ago | parent [-]

> If that is the case, then copyright holds no weight any more. I should be allowed to train an LLM on decompiled firmware (say, Playstation, Switch, iPhone) in countries where decompilation is legal - then have the LLM produce equivalent firmware that I later use to build an emulator (or competing open source firmware).

It's funny you mention that, because one of the biggest fair use cases that effectively cemented "fair use" for emulators is Sony Computer Entertainment Inc v. Connectix Corp.[1] where the copying of PlayStaion BIOS files for the purposes of reverse engineering and creating an emulator was explicitly ruled to be fair use, including running that code through a disassembler.

[1]: https://en.wikipedia.org/wiki/Sony_Computer_Entertainment,_I....

input_sh 2 days ago | parent | prev [-]

You and I are not a fucking judge, our opinions on this don't matter one bit. We might as well print it on a piece of paper and wipe our asses with it.