Remix.run Logo
bawolff a day ago

> Most generative AI corpora were arguably trained on copyrighted material, making the output potentially infringing.

Training is not neccesarily sufficient for it to be a derrivative work, just like if you learned to draw based on famous drawings doesn't mean every single drawing you ever made is infringing.

Obviously there are cases where it could be infringing, its going to depend how close the output is to the original.

I guess it depends on how you read the post, is it saying use gen-AI to intentionally recreate the photo, something that sounds danger-zone, or are they saying use gen-ai to make some other photo suitable for purpose?

ghaff a day ago | parent | next [-]

I'm largely out of this space now but my understanding is that some copyright cases around model training are winding through courts but I haven't seen anything definitive come out. The IP lawyers I know are skeptical but we'll see.

p_l a day ago | parent | next [-]

EU AI Act is moving towards genAI output being non-copyrightable and that you'd need to actually prove derivative character from a specific copyrighted work(s) to claim infringement.

AFAIK american law is going towards similar setup.

ghaff a day ago | parent [-]

IANAL but, yes, with US/UK (i.e. common law regimes) that's something along my understanding as well. Which I generally agree with even if some/many readers here probably do not. Of course, output being copyrightable and copyright infringement on the inputs are two different things.

p_l a day ago | parent [-]

An important point in copyright infringement is that it generally applies on distribution to other parties.

So the process of acquiring inputs may or may not be an infringement, but with at least proposed EU rules it does not matter to created model itself.

The exception being that output it produces is judged similar to infringement as human output without any "transformative work" credit to model - so similar to how a human could learn a book or painting to memory and close enough reproduction from memory would be infringement, but not generally using the ideas taken from them

pbhjpbhj 5 hours ago | parent [-]

Actually, copyright generally is infringed when a copy is made; hence the name.

That's why, say, 17 USC 106 lists reproduction as the first exclusive right of a copyright holder. And why Berne Article 9 [^1] is about restricting right of reproduction to the author.

Damages are often, in different jurisdictions, related to actual harm. So, distribution is the focus of lawsuits because actual harm in the making of a copy is usually negligible. Few people are suing to stop copying, they're suing to be recompensed for the [potential] commercial benefit derived from the copying.

In as far as you need to make a copy to use it to process and adjust the weights of an ML model, then yes this activity is an infringement to the right to control reproduction.

One of the measures for transformative use is whether the production of the copy commercially harms the original creator/author. I can't see how you can argue that ML models don't do that. Besides which we don't have an equivalent precedent to 'transformative use' in UK so where our courts can go with all this is not clear.

https://www.wipo.int/wipolex/en/text/283698

kirrent a day ago | parent | prev [-]

Bartz v Anthropic is some good authority on fair use (https://storage.courtlistener.com/recap/gov.uscourts.cand.43...). Still, it can't be said to be definitive because the plaintiff's arguments on market harm (with respect to fair use, not piracy) were limited and there were, as far as I can remember, no compelling examples provided of model output reproducing large swathes of training text.

LoganDark a day ago | parent | prev | next [-]

> Training is not neccesarily sufficient for it to be a derrivative work, just like if you learned to draw based on famous drawings doesn't mean every single drawing you ever made is infringing.

We don't know that model training is the same thing as inspiration. Training is a mathematical process with theoretically deterministic outputs. It's converging the weights towards being able to exactly reproduce the training data, rather than parts of the training data subjectively influencing a creative output. We will just have to see how this plays in court.

lelandfe a day ago | parent | prev [-]

Sometimes human writers sit down to write and accidentally end up verbatim reproducing an NYT paywalled article, too, and no one bats an eye, but AI does it and allll of a sudden we’re in court? Poppycock!