Remix.run Logo
tpmoney 2 days ago

I'm not really sure why you think my comment specifically citing the recent rulings by Judge Alsup and also the prior history with respect to the Google Books project is somehow declaring "I can do whatever I like to any copyrighted content", but I assure you I'm not. I'm very specifically talking about the various cases that have come about in the digital age dealing with fair use as it has been interpreted by US courts to apply to the use of computers to create copies of works for the purposes of creating other works.

I'm referring to the long history of carefully threaded fair use rulings and settlements, many of which we as an industry have benefitted greatly from. From determinations that cloning a BIOS can be fair use (see IBM PC bios cloning, but also Sony v. Connectix), or that cloning an entire API for the purposes of creating a parallel competitive product (Google v. Oracle), or digitizing books for the purposes of making those books searchable and even displaying portions of those books to users (Authors Guild v. Google) or even your cable company offering you "remote DVR" copying of broadcast TV (20th Century Fox v. Cablevision). Time and again the courts have found that copyright, and especially copyright with respect to digital transformations is far more limited than large corporations would prefer. Further they have found in plenty of cases that even a direct 1:1 copy of source can be fair use, let alone copies which are "transformative" as LLM training was found to be in Bartz.

Realistically, I don't see how anyone can have watched the various copyright cases that have been decided in the digital age, and seen the battles that the EFF (and a good part of the tech industry) have waged to reduce the strength of copyright and not also see how AI training can very easily fit within that same framework.

Not to cast aspersions on my fellow geeks and nerds, but it has been very interesting to me to watch the "hacker" world move from "information wants to be free" to "copyright maximalists" once it was their works that were being copied in ways they didn't like. For an industry that has brought about (and heavily promoted and supported) things like DeCSS, BitTorrent, Handbrake, Jellyfin/Plex, numerous emulators, WINE, BIOS and hardware cloning, ad blockers, web scrapers and many other things that copyright owners have been very unhappy about, it's very strange to see this newfound respect for the sanctity of copyright.

> I can easily make a case that "buying a copy" in the case of a GPL-2 codebase is "agreeing to the license" and that such an agreement could easily say "anything trained on this must also be released as GPL-2".

And I would argue that obtaining a legal copy of the GPL source to a program requires no such agreement. By downloading a copy of a GPLed program I am entitled by the terms under which that software was distributed to obtain a copy of the source code. I do not have to agree to any other terms in order to obtain that source code, downloading from someone authorized to distribute that code is in and of itself sufficient to entitle me to that source code. You can not, by the very terms of the GPL itself deny me a copy of the source code for GPL software you have distributed to me, even if you believe I intend to make distributions that are not GPL compliant. You can decline to distribute the software to me in the first place, but once you have distributed it to me, I am legally entitled to a copy of the source code. From there, now that I have a legal copy, the question becomes is making additional copies for the purposes of training an AI model fair use? So far, the most definitive case we have on the matter (Bartz) says yes it is.

So either we have to make the case that the original copy was somehow acquired from a source not authorized to make that copy, or we have to argue that the output of the AI model or the AI model is itself infringing. Given the ruling that copies made for training an AI model was ruled "exceedingly transformative and was a fair use under Section 107 of the Copyright Act"[1] it seems unlikely that the AI model itself is going to be found to be infringing. That leaves the output of the model itself, which Bartz does not rule on, as the authors never alleged the output of the model was infringing. GPL software authors might be able to prevail on that point, but they would have a pretty uphill battle I think in demonstrating that the model generated infringing output and not simply functional necessary code that isn't covered by copyright. The ability of code to be subject to copyright has long been a sort of careful balance between protecting a larger creative idea, and also not simply walling off whole avenues of purely functional decisions from all competitors.

[1]: https://admin.bakerlaw.com/wp-content/uploads/2025/07/ECF-23...