Remix.run Logo
gooosle 7 days ago

So... it would be a lot cheaper to just buy all of the books?

gpm 7 days ago | parent | next [-]

Yes, much.

And they actually went and did that afterwards. They just pirated them first.

dude250711 7 days ago | parent | next [-]

What is the HN term for this? "Bootstrapping" your start up? Or is it "growth-hacking" it?

gpm 7 days ago | parent | next [-]

The latter (I know you're joking, but...)

Bootstrapping in the startup world refers to starting a startup using only personal resources instead of using investors. Anthropic definitely had investors.

chrisvenum 7 days ago | parent | prev [-]

Bookstrapping

rise_before_sun 7 days ago | parent | prev [-]

Where can I find source that says Anthropic bought the pirated books afterwards? I haven't seen this in any official document.

Also, do we know if the newer models were trained without the pirated books?

gpm 7 days ago | parent [-]

> Where can I find source that says Anthropic bought the pirated books afterwards? I haven't seen this in any official document.

https://storage.courtlistener.com/recap/gov.uscourts.cand.43...

> Also, do we know if the newer models were trained without the pirated books?

I'm pretty sure we do but I couldn't swear to it or quickly locate a source.

rise_before_sun 7 days ago | parent [-]

Thanks for the link.

Among several places where judge mentions Anthropic buying legit copies of books it pirated, probably this sentence is most relevant: "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages."

But document does not say Anthropic bought EVERY book it pirated. Other sections in the document also don't explicitly say that EVERY pirated book was later purchased.

I stopped using Claude when this case came to light. If the newer Claude models don't use pirated books, I can resume using it.

When you say, "I'm pretty sure we do...", do you mean that pirated books were used, or were they not used?

gpm 7 days ago | parent [-]

> But document does not say Anthropic bought EVERY book it pirated

Yeah, I wouldn't make this exact claim either. For instance it's probably safe to assume that the pirate datasets contain some books that are out of circulation and which Anthropic happened not to get a used copy of.

They did happen to get every book published by any of the lead plaintiffs though, as a point towards them probably having pretty good coverage. And it does seem to have been an attempt to purchase "all" the books for reasonable approximate definitions of "all".

> When you say, "I'm pretty sure we do...", do you mean that pirated books were used, or were they not used?

I'm pretty sure pirated books were not used, but not certain, and I really don't remember when/why I formed that opinion.

eviks 7 days ago | parent | prev | next [-]

That might be practically impossible given the number of rights holders worldwide

privatelypublic 7 days ago | parent | prev | next [-]

The permission to buy them was already settled by Google Books in the 00's.

_alternator_ 7 days ago | parent | prev | next [-]

They did, but only after they pirated the books to begin with.

privatelypublic 7 days ago | parent | prev [-]

Few. This settlement potentially weakens all challenges to the use of copyrighted works in training LLM's. I'd be shocked if behind closed doors there wasn't some give and take on the matter between Executives/investors.

A settlement means the claimants no longer have a claim, which means if they're also part of- say, the New York Times affiliated lawsuit- they have to withdraw. A neat way of kneecapping a country wide decision that LLM training on copy written material is subject to punitive measures don't you think?

freejazz 7 days ago | parent [-]

That's not even remotely true. Page 4 of the settlement describes released claims which only relate to the pirating of books. Again, the amount of misinformation and misunderstanding I see in copyright related threads here ASTOUNDS.

privatelypublic 7 days ago | parent [-]

Did you miss the "also" how about "adjacent"? I won't pretend to understand the legal minutia, but reading the settlement doesn't mean you do either.

In my experience&training in a fintech corp- Accepting a settlement in any suit weakens your defense- but prevents a judgement and future claims for the same claims from the same claimants (a la double jeopardy). So, again- at minimum- this prevents an actual judgement. Which, likely would be positive for the NYT (and adjacent) cases.

freejazz 6 days ago | parent [-]

I'm not sure how your confusion about what's going on is being projected to me. What about "also" what about "adjacent"?

>In my experience&training in a fintech corp- Accepting a settlement in any suit weakens your defense- but prevents a judgement and future claims for the same claims from the same claimants (a la double jeopardy). So, again- at minimum- this prevents an actual judgement. Which, likely would be positive for the NYT (and adjacent) cases.

Okay? I'm an IP litigator and you clearly have no idea what you're talking about. The only thing left to try in this case was the book library piracy. Alsup's fair use decision is just as relevant and is not mooted by the settlement and will be cited by anyone that thinks its favorable to them.