Remix.run Logo
martin-t 5 days ago

LLM providers should only be allowed to train on data in public domain or their models and outputs should interior the license of the training data.

And people should own all data about themselves, all rights reserved.

It's ironic copyright is the law that protects against this kind of abuse. And this is of course why big "AI" companies are trying to weaken it by arguing models training is not derivative work.

Or by claiming that writing a prompt in 2 minutes is enough creative work to own copyright of the output despite the model being based on 10^12 hours of human work, give or take a few orders of magnitude.

j45 5 days ago | parent [-]

Makes sense, have to deal with the cat being out of the bag though.

The groups that didn't train on public domain content would have an advantage if it's implemented as a rule moving forward at least for some time.

New models following this could create a gap.

I'm sure competition as has been seen from open-source models will be able to

martin-t 5 days ago | parent [-]

It's simple, the current models and their usage is copyright infringement.

Just because everyone is doing it doesn't meant it's right or legal. Only that a lot of very rich companies deserve to get punished and pay the creators.

j45 4 days ago | parent [-]

I was referring to the issue of the new models having to train different than the original ones.

Not arguing, debating about the legality of what the models have done.

Anthropic just paid a settlement. But they also bought a ton of book and scanned them, which might be more than other models. Maybe it's a sign of things to come.

martin-t 2 days ago | parent [-]

$3000 per book is laughable given they can then use the book as much as they want and if they build a good enough model, it'll completely obviate the need for the book.

Copyright designed at a time when reproducing work in way which was not verbatim and not obviously modified to avoid detection (like synonym replacement) would require a lot of human work and be too costly to be done. Now it's automated. It fundamentally changes everything.

Human work is what's to be rewarded, according to the amount of quality.