Remix.run Logo
rvnx 5 hours ago

Why LLM companies that depended on Anna's archive end up so clean ? Looks like Anna's archive was doing the dirty work, and the LLM companies were reaping the profits (and ironically still do, as they hold the largest databases of pirated content in the world).

Is it because the law doesn't apply to you when you have 1B USD ?

random3 5 hours ago | parent | next [-]

While that may be the case it’s hard to make this claim when: - Anthropic settled a similar case - Anna didn’t show up in court

metadat 4 hours ago | parent | next [-]

Showing up is a trap for Anna - who doesn't have 5 billion dollars to settle.

contubernio 4 hours ago | parent | prev | next [-]

Justice should not depend on whether the aggrieved appears in court. That's a structural weakness of US law.

TremendousJudge 4 hours ago | parent [-]

is there a country where if you don't show up to court you don't lose by default?

philistine 3 hours ago | parent | next [-]

Exactly, how can you credibly mount a defence if you're unwilling to appear? Your defence is that, yours.

Schmerika 2 hours ago | parent | prev | next [-]

Supposing that to be true... Does justice depend on what every other country is doing?

nibbleyou 3 hours ago | parent | prev [-]

[dead]

ffsm8 4 hours ago | parent | prev | next [-]

Uh, aren't you confirming his opinion with that? After all, Anna doesn't have the money to fight this in court

YetAnotherNick 4 hours ago | parent [-]

No. Anthropic fought and paid $1.5 billion in settlement and agreed to delete all the copyrighted material.

ffsm8 4 hours ago | parent | next [-]

I'm confused here, how is this not even more of a confirmation?

Essentially: have funny amounts of money and the law ceases to matter. Or don't, and be squashed by the right holders

jstanley 4 hours ago | parent [-]

$1.5 billion is more than $19.5 million though.

whycome 4 hours ago | parent | prev [-]

Delete? Wasn’t that material already used to train models?

rho_soul_kg_m3 4 hours ago | parent | next [-]

All AI companies should be forced to re-train their models without the offending materials, and this should also extend to all LLMs distilled from models exposed to copyrighted works. Also cover code under licences such as GPL as well. Not to mention patents and designs. This whole LLM business is a giant IP laundromat.

saidnooneever 4 hours ago | parent | prev [-]

well i guess its copyright not distill-statistical-model-from-it-rights.

jasonmp85 4 hours ago | parent | prev [-]

Anthropic knows they could just pay off the aggrieved party.

The operators of Anna's know they will go to prison.

tim333 3 hours ago | parent | prev | next [-]

You can make an argument that training an LLM on something is not the same as copying it in the same way that your brain is not in breach of copyright for having watched a Disney movie. I'm not sure of the rights and wrongs of that but it complicates legal action.

nemomarx 3 hours ago | parent [-]

Can I download an archive of movies so a human animator can study the techniques there?

Surely you have to make the copy to feed it into the llm for training, so

tim333 3 hours ago | parent [-]

I think some of the LLM companies have used legally purchased materials.

nemomarx 2 hours ago | parent [-]

Do you happen to have an example? The closest I can think of is adobe for images, but I've never heard of a text based llm trained purely on legally acquired books.

tim333 2 hours ago | parent [-]

There's stuff here on Anthropic https://www.reddit.com/r/books/comments/1lkv2r9/anthropic_de...

TiredOfLife 3 hours ago | parent | prev [-]

Distribution. Anna's archive actively distributes the pirated material. LLM companies don't.

e12e an hour ago | parent | next [-]

I'd argue the LLMs certainly distribute copyright material. That's why it can do things like:

https://g.co/gemini/share/20843b4609d9

Now, you could argue quotes are fair use - but can you argue the material isn't part of the LLM?

bubblegumcrisis 3 hours ago | parent | prev [-]

Fruit of the poisonous tree.