Remix.run Logo
tpmoney 21 hours ago

> It’s disappointing that it seems limited to only the copying of books into a data set and not the result of training LLM on protected works. This is not the “use” that I was discussing and not very interesting.

I agree that a ruling on the outputs specifically would have been interesting an instructive, but I disagree with the interpretation that by omission fair use would not apply to those outputs. The outputs were not challenged as the judge notes because the plaintiffs did not allege the outputs of the AI were infringing. The only conclusion we can really draw from this is that the plaintiffs didn't think they could make a good case for the outputs being infringing. Maybe GPL software authors could do so, but clearly these book authors did not think they could. Judge Alsup does note that it's certainly possible for those outputs to be infringing, but that such a case would have to be litigated separately.

And again, this all makes sense to me if you've followed copyright law through the digital age. A xerox machine can be use to create verbatim, clearly infringing copies of works covered by copyright. But that being the case does not mean that making a xerox machine is a violation of copyright, even if you use copyrighted material to test the machine. It does not mean that selling a xerox machine is a violation of copyright, even if you use copyrighted material to demonstrate the capabilities when selling the machine. And it does not mean that every use of a xerox machine is inherently a copyright violation, even if any individual use can be.

Similarly consider CD ripping software (like iTunes) or DVD/BluRay ripping software like Handbrake. I would be comfortable betting that over 90% of all copies made by iTunes or Handbrake are copies of works that the copy maker does not own copyright to (remember the "Rip, Mix, Burn" iTunes commercials?). But that being the case, iTunes CD ripping capabilities and Handbrakes DVD ripping capabilities are not themselves copyright violations, nor is distributing that software, even with instructions for how the end user can use that software to make copies of material that they do not own the copyright for. That this software can enable piracy on a mass scale does not inherently make every use of the software a copyright violation. Whether or not the output of iTunes or Handbrake is "fair use" is and must be litigated on an individual basis. The output is not inherently one or the other.

> The plaintiffs also make really awful arguments about “memorizing” and “learning” that falsely anthropomorphize LLMs. Which the judge shoots down.

> If we’re going to give LLMs the same rights as humans, there’s unlikely to much of an argument.

Judge Alsup goes much further than just "shoot[ing] down" the arguments about memorizing and learning, he also very explicitly says right on page 9:

    To summarize the analysis that now follows, the use of the books at issue to train Claude
    and its precursors was exceedingly transformative and was a fair use under Section 107 of the
    Copyright Act.
and later:

    In short, the purpose and character of using copyrighted works to train LLMs to generate
    new text was quintessentially transformative. Like any reader aspiring to be a writer,
    Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but
    to turn a hard corner and create something different. If this training process reasonably
    required making copies within the LLM or otherwise, those copies were engaged in a
    transformative use.