Remix.run Logo
davemp 2 days ago

Claiming LLMs are fair use is ridiculous bordering on ignorant or disingenuous.

Here’s the 4 part test from 17 U.S.C. § 107:

1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

Fail. The use is to make trillions of dollars and be maximally disruptive.

2. the nature of the copyrighted work;

Fail. In many cases at least, the copy written code is commercial or otherwise supports livelihoods; and is the result much high skill labor with the express stipulation for reciprocity.

3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

Fail. They use all of it.

4. the effect of the use upon the potential market for or value of the copyrighted work.

Fail to the extreme. There is already measurable decline in these markets. The leaders explicitly state that they want to put knowledge workers out of business.

- - -

Hell, LLMs don’t even pass the sniff test.

The only reason this stuff is being entertained is some combination of the prisoner’s dilemma and more classic greed.

cxr 2 days ago | parent | next [-]

This comment highlights a basic dilemma about how and where to spend your time.

Here's a basic rule of thumb I recommend people apply when it comes to these sorts of long, contentious threads where you know that not every person showing up to the conversation is limiting themselves to commenting about things they understand and that involve some of the most tortured motivated reasoning about legal topics:

If the topic is copyright and someone who is speaking authoritatively has just used the words "copy written", then ignore them. Consider whether you need to be anywhere in the conversation at all, even as a purely passive observer. Think about all the things you can do instead of wasting your time here, where the stakes for participation are so low because nothing that is said here really matters. Go do something productive.

2 days ago | parent | next [-]
[deleted]
davemp 2 days ago | parent | prev [-]

Yet you still wasted your own time and everyone else’s time with a reply that has even less substance.

I was making an argument based on quotes from the actual legal code and you’re saying pions who don’t use the exact correct terminology shouldn’t even consider what should or shouldn’t be legal? What a load of junk. This is a democracy. We’re supposed to be engaging with it.

luma 2 days ago | parent | prev | next [-]

You’re mixing up “using” with “copying”. You are allowed to “use” all of a book or movie or code by listening to or watching or reviewing the whole thing. Copyright protects copies. The legal claim here is than training an LLM is sufficiently transformative such that it cannot be construed as a copy.

davemp 2 days ago | parent [-]

I replied to someone saying that it’s fair use, which presupposes that it’s a derivative work.

joshuacc 2 days ago | parent | prev | next [-]

These are factors to be considered, not pass/fail questions.

tpmoney 2 days ago | parent | prev [-]

> Fail. The use is to make trillions of dollars and be maximally disruptive.

Fair use has repeatedly been found even in cases where the copies were used for commercial purposes. See Sony v. Connectix for example, where the cloning and disassembly of the PlayStation BIOS for the purposes of making a commercially sold (at retail, in a box) emulator of a then currently sold game console was determined to be fair use.

> Fail. In many cases at least, the copy written code is commercial or otherwise supports livelihoods; and is the result much high skill labor with the express stipulation for reciprocity.

Again, see Sony V. Connectix where the sales of PlayStation consoles support the livelihoods and skilled labor of Sony engineers.

> Fail. They use all of it.

And again, see Sony V. Connectix, where the entire BIOS was copied again and again until a clone could be written that sought to reproduce all the functionality of the real BIOS. Or see Google V. Oracle where cloning the entire Java API for a competing commercial product was also deemed fair use. Or the Google Books lawsuits, where cloning entire books for the purposes of making them searchable online was deemed fair use. Or see any of the various time/format shifting cases over the years (Cassette tapes, VCRs, DVRs, MP3 encoders, DVD ripping etc) where making whole and complete copies of works is deemed fair use.

> Fail to the extreme. There is already measurable decline in these markets. The leaders explicitly state that they want to put knowledge workers out of business.

Again, see Sony v. Connectix where the commercial product deemed to be fair use was directly competing with an actively sold video game console. Copyright protects the rights of creators to exploit their own works, it does not protect them against any and all forms of competition.

Or perhaps instead of referring you to the history of legislation around copyright in the digital age, I should instead simply point you at Judge Alsup's ruling in the Bartz case where he details exactly why the facts of the case and prior case law find that training an AI on copyrighted material is fair use [1]. Of particular interest to you might be the fact that each of the 4 factors is not a simple "pass/fail" metric, but a weighing of relative merits. For example, when examining factor 1, Judge Alsup writes:

> That the accused is a commercial entity is indicative, not dispositive. That

> the accused stands to benefit is likewise indicative. But what matters most

> is whether the format change exploits anything the Copyright Act reserves to

> the copyright owner.

[1]: https://admin.bakerlaw.com/wp-content/uploads/2025/07/ECF-23...

davemp a day ago | parent [-]

I appreciate the detailed reply and that there’s subtlety here.

I read the linked Bartz case. It’s disappointing that it seems limited to only the copying of books into a data set and not the result of training LLM on protected works. This is not the “use” that I was discussing and not very interesting.

The plaintiffs didn’t even challenge that the outputs of the LLMs infringe. They judge seems to agree (at least by omission) that fair use wouldn’t apply but that the outputs were transformative and in cases where they aren’t:

> [anthropic] placed additional software between the user and the underlying LLM to ensure that no infringing output ever reached the users.

So this is not true:

> he [the judge] details exactly why the facts of the case and prior case law find that training an AI on copyrighted material is fair use

The plaintiffs also make really awful arguments about “memorizing” and “learning” that falsely anthropomorphize LLMs. Which the judge shoots down.

If we’re going to give LLMs the same rights as humans, there’s unlikely to much of an argument.

I think there’s potential for an argument about how LLMs use “compressed” versions of protected works to _mechanically_ traverse language space. It would be subtle and technical so maybe not likely to work in our current context.

tpmoney 21 hours ago | parent [-]

> It’s disappointing that it seems limited to only the copying of books into a data set and not the result of training LLM on protected works. This is not the “use” that I was discussing and not very interesting.

I agree that a ruling on the outputs specifically would have been interesting an instructive, but I disagree with the interpretation that by omission fair use would not apply to those outputs. The outputs were not challenged as the judge notes because the plaintiffs did not allege the outputs of the AI were infringing. The only conclusion we can really draw from this is that the plaintiffs didn't think they could make a good case for the outputs being infringing. Maybe GPL software authors could do so, but clearly these book authors did not think they could. Judge Alsup does note that it's certainly possible for those outputs to be infringing, but that such a case would have to be litigated separately.

And again, this all makes sense to me if you've followed copyright law through the digital age. A xerox machine can be use to create verbatim, clearly infringing copies of works covered by copyright. But that being the case does not mean that making a xerox machine is a violation of copyright, even if you use copyrighted material to test the machine. It does not mean that selling a xerox machine is a violation of copyright, even if you use copyrighted material to demonstrate the capabilities when selling the machine. And it does not mean that every use of a xerox machine is inherently a copyright violation, even if any individual use can be.

Similarly consider CD ripping software (like iTunes) or DVD/BluRay ripping software like Handbrake. I would be comfortable betting that over 90% of all copies made by iTunes or Handbrake are copies of works that the copy maker does not own copyright to (remember the "Rip, Mix, Burn" iTunes commercials?). But that being the case, iTunes CD ripping capabilities and Handbrakes DVD ripping capabilities are not themselves copyright violations, nor is distributing that software, even with instructions for how the end user can use that software to make copies of material that they do not own the copyright for. That this software can enable piracy on a mass scale does not inherently make every use of the software a copyright violation. Whether or not the output of iTunes or Handbrake is "fair use" is and must be litigated on an individual basis. The output is not inherently one or the other.

> The plaintiffs also make really awful arguments about “memorizing” and “learning” that falsely anthropomorphize LLMs. Which the judge shoots down.

> If we’re going to give LLMs the same rights as humans, there’s unlikely to much of an argument.

Judge Alsup goes much further than just "shoot[ing] down" the arguments about memorizing and learning, he also very explicitly says right on page 9:

    To summarize the analysis that now follows, the use of the books at issue to train Claude
    and its precursors was exceedingly transformative and was a fair use under Section 107 of the
    Copyright Act.
and later:

    In short, the purpose and character of using copyrighted works to train LLMs to generate
    new text was quintessentially transformative. Like any reader aspiring to be a writer,
    Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but
    to turn a hard corner and create something different. If this training process reasonably
    required making copies within the LLM or otherwise, those copies were engaged in a
    transformative use.