Remix.run Logo
CursedSilicon 2 days ago

That's quite a strawman definition of "copyright infringement" especially given the ongoing Anthropic lawsuit

It's not a question of if feeding all the worlds books into a blender and eating the resulting slurry paste is copyright infringement. It's that they stole the books in the first place by getting them from piracy websites

If they'd purchased every book ever written, scanned them in and fed that into the model? That would be perfectly legal

steveklabnik 2 days ago | parent [-]

That’s what happened; the initial piracy was an issue, but those models were never released, and the models that were released were trained on copyrighted works they purchased.

boristsr 2 days ago | parent [-]

That's not true, or they wouldn't have settled for 1.5bln specifically for training on pirated material.

https://apnews.com/article/anthropic-copyright-authors-settl...

steveklabnik a day ago | parent [-]

As I said, the initial piracy was an issue. That is what they settled over. Your link covers this:

> A federal judge dealt the case a mixed ruling in June, finding that training AI chatbots on copyrighted books wasn’t illegal but that Anthropic wrongfully acquired millions of books through pirate websites.

With more details about how they later did it legally, and that was fine, but it did not excuse the earlier piracy:

> But documents disclosed in court showed Anthropic employees’ internal concerns about the legality of their use of pirate sites. The company later shifted its approach and hired Tom Turvey, the former Google executive in charge of Google Books, a searchable library of digitized books that successfully weathered years of copyright battles.

> With his help, Anthropic began buying books in bulk, tearing off the bindings and scanning each page before feeding the digitized versions into its AI model, according to court documents. That was legal but didn’t undo the earlier piracy, according to the judge.