Remix.run Logo
jawns 3 days ago

You've misunderstood the case.

The suit isn't about Anthropic training its models using copyrighted materials. Courts have generally found that to be legal.

The suit is about Anthropic procuring those materials from a pirated dataset.

The infringement, in other words, happened at the time of procurement, not at the time of training.

If it had procured them from a legitimate source (e.g. licensed them from publishers) then the suit wouldn't be happening.

greensoap 3 days ago | parent | next [-]

A point of clarifications and some questions.

The portion the court said was bad was not Anthropic getting books from pirated sites to train its model. The court opined that training the model was fair use and did not distinguish between getting the books from pirated sites or hard copy scans. The part the court said was bad, which was settled, was Anthropic getting books from a pirate site to store in a general purpose library.

--

  "To summarize the analysis that now follows, the use of the books at issue to train Claude
  and its precursors was exceedingly transformative and was a fair use under Section 107 of the
  Copyright Act. And, the digitization of the books purchased in print form by Anthropic was. 
  also a fair use but not for the same reason as applies to the training copies. Instead, it was a
  fair use because all Anthropic did was replace the print copies it had purchased for its central
  library with more convenient space-saving and searchable digital copies for its central
  library — without adding new copies, creating new works, or redistributing existing copies.
  However, Anthropic had no entitlement to use pirated copies for its central library. Creating a
  permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy."

  "Because the legal issues differ between the *library copies* Anthropic purchased and
  pirated, this order takes them in turn."

--

Questions

As an author do you think it matters where the book was copied from? Presumably, a copyright gives the author the right to control when a text is reproduced and distributed. If the AI company buys a book and scans it, they are reproducing the book without a license, correct? And fair use is the argument that even though they violated the copyright, they are execused. In a pure sense, if the AI company copied (assuming they didn't torrent back the book) from a "pirate source" why is that copy worse then if they copied from a hard book?

8note 3 days ago | parent | next [-]

> AI company buys a book and scans it, they are reproducing the book without a license, correct

isn't digitizing your own copies as backups and personal use fine? so long as you dont give away the original while keeping the backups. similarly, dont give away the digital copies.

esrauch 3 days ago | parent [-]

It is, Google Books did it over a decade ago (bought up physical books and scanned them all). There were some rulings about how much of a snippet they were allowed to show end users as fair use, but I'm fairly sure the actual scanning and indexing of the books was always allowed.

cortesoft 3 days ago | parent | prev [-]

> If the AI company buys a book and scans it, they are reproducing the book without a license, correct?

No? I think there are a lot more details that need to be known before answering this question. It matters what they do with it after they scan it.

greensoap 3 days ago | parent [-]

That is only relevant to whether it is fair use not to whether the copying is an infringement. Fair use is what is called an affirmative defense -- it means that yes what I did was technically a violation but is forgiven. So on technicalities the copying is an infringement but that infringement is "okay" because there is a fair use. A different scenario is if the copyright owner gives you a license to copy the work (like open source licenses). In that scenario the copying was not an infringement because a license exists.

gpm 3 days ago | parent | next [-]

> Fair use is what is called an affirmative defense

Yes

> it means that yes what I did was technically a violation but is forgiven

Not at all. All "affirmative defence" means is that procedural the burden is on me to establish that I was not violating the law. The law isn't "you can't do the thing", rather it is "you can't do the thing unless its like this". There is no violation, there is no forgiveness as there is nothing to forgive, because it was done "like this" and doing it "like this" doesn't violate the law in the first place.

cortesoft 3 days ago | parent | prev [-]

If I have have an app on my phone that lets me point my phone at a page to scan, OCR, and read the page out loud to me, it wouldn't even require fair use, would it?

mmargenot 3 days ago | parent | prev | next [-]

Do foundation model companies need to license these books or simply purchase them going forward?

sharkjacobs 3 days ago | parent | next [-]

> On June 23, 2025, the Court rendered its Order on Fair Use, Dkt. 231, granting Anthropic’s motion for summary judgment in part and denying its motion in part. The Court reached different conclusions regarding different sources of training data. It found that reproducing purchased and scanned books to train AI constituted fair use. Id. at 13-14, 30–31. However, the Court denied summary judgment on the copyright infringement claims related to the works Anthropic obtained from Library Genesis and Pirate Library Mirror. Id. at 19, 31.

https://www.documentcloud.org/documents/26084996-proposed-an...

> reproducing purchased and scanned books to train AI constituted fair use

greensoap 3 days ago | parent | next [-]

Actually, the court really only said downloading a pirated book to store in your "library" was bad. The opinion is intentionally? ambiguous on whether the decision regarding copies used to train an LLM applies only to scanned books or also to pirated books. The facts found in the case are the training datasets were made from the "library" copies of books that included scans and pirated downloads. And the court said the training copies were fair use. The court also said the scanned library copies were fair use. The court found that the pirated library copies was not fair use. The court did not say for certain whether the pirated training copies were fair use.

thaumasiotes 3 days ago | parent | prev [-]

The usual analysis was that when you download a book from Library Genesis, that is an instance of copyright infringement committed by Library Genesis. This ruling appears to reverse that analysis.

papercrane 3 days ago | parent [-]

Do you have a source for that because MAI Systems Corp. v. Peak Computer, Inc established that even creating a copy in RAM is considered a "copy" under the Copyright Act and can be infringement.

parineum 3 days ago | parent [-]

It's not an issue of where it's being copied, it's who's doing the copying.

Library Genesis has one copy. It then sends you one copy and keeps it's own. The entity that violated the _copy_right is the one that copied it, not the one with the copy.

masfuerte 3 days ago | parent [-]

There are many copies made as the text travels from Library Genesis to Anthropic. This isn't just of theoretical interest. English law has specific copyright exemptions for transient copies made by internet routers, etc. It doesn't have exemptions for the transient copies made by end users such as Anthropic, and they are definitely infringing.

Of course, American law is different. But is it the case that copies made for the purpose of using illegally obtained works are not infringing?

thaumasiotes 3 days ago | parent [-]

> But is it the case that copies made for the purpose of using illegally obtained works are not infringing?

Well, the question here is "who made the copy?"

If you advertise in seedy locations that you will send Xeroxed copies of books by mail order, and I order one, and you then send me the copy I ordered, how many of us have committed a copyright violation?

masfuerte 3 days ago | parent [-]

Copyright law is literally about the copies. A xeroxed book is exactly one copy. Mailing and reading that book doesn't copy it any further. In contrast, you can't do anything with digital media without making another copy.

> "Who made the copy?"

This begs the question. With digital media everybody involved makes multiple copies.

bhickey 3 days ago | parent | prev [-]

Probably the latter.

gowld 3 days ago | parent | prev [-]

I thought that distribution of copyrighted materials was legally encumbered, not reception thereof.

lawlessone 3 days ago | parent | next [-]

Did they use a torrent? If they used a torrent isn't it likely they distributed it while downloading it?

gkbrk 3 days ago | parent [-]

Surely a state-of-the-art tech company would know how to disable seeding.

LeoPanthera 3 days ago | parent [-]

BitTorrent clients will not send data to clients which aren't uploading, as far as I know.

adrr 3 days ago | parent | prev | next [-]

Downloading is making a copy and covered by copyright law. Its also covered by statutory damages clause of up to $150k per violation if willful. I assume Anthropic knew they were using pirated books.

thayne 3 days ago | parent | prev [-]

Do you have a source for that? My understanding was that both were illegal, although of course media companies have an interest in making people believe that even if it isn't true.