▲ | martin-t 5 days ago | |||||||||||||||||||||||||
LLM providers should only be allowed to train on data in public domain or their models and outputs should interior the license of the training data. And people should own all data about themselves, all rights reserved. It's ironic copyright is the law that protects against this kind of abuse. And this is of course why big "AI" companies are trying to weaken it by arguing models training is not derivative work. Or by claiming that writing a prompt in 2 minutes is enough creative work to own copyright of the output despite the model being based on 10^12 hours of human work, give or take a few orders of magnitude. | ||||||||||||||||||||||||||
▲ | j45 5 days ago | parent [-] | |||||||||||||||||||||||||
Makes sense, have to deal with the cat being out of the bag though. The groups that didn't train on public domain content would have an advantage if it's implemented as a rule moving forward at least for some time. New models following this could create a gap. I'm sure competition as has been seen from open-source models will be able to | ||||||||||||||||||||||||||
|