| ▲ | kouteiheika 4 hours ago | ||||||||||||||||||||||
> I would go so far as to say the most restrictive license that the model is trained on should be applied to all model generated code. That license is called "All Rights Reserved", in which case you wouldn't be able to legally use the output for anything. There are research models out there which are trained on only permissively licensed data (i.e. no "All Rights Reserved" data), but they're, colloquially speaking, dumb as bricks when compared to state-of-art. But I guess the funniest consequence of the "model outputs are a derivative work of their training data" would be that it'd essentially wipe out (or at very least force a revert to a pre-AI era commit) every open source project which may have included any AI-generated or AI-assisted code, which currently pretty much includes every major open source project out there. And it would also make it impossible to legally train any new models whose training data isn't strictly pre-AI, since otherwise you wouldn't know whether your training data is contaminated or not. | |||||||||||||||||||||||
| ▲ | progval 4 hours ago | parent | next [-] | ||||||||||||||||||||||
> There are research models out there which are trained on only permissively licensed data Models whose authors tried to train only on permissively licensed data. For example https://huggingface.co/bigcode/starcoder2-15b tried to be a permissively licensed dataset, but it filtered only on repository-level license, not file-level. So when searching for "under the terms of the GNU General Public License" on https://huggingface.co/spaces/bigcode/search-v2 back when it was working, you would find it was trained on many files with a GPL header. | |||||||||||||||||||||||
| ▲ | kshri24 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
I agree with your assessment. Which is why I was proposing a middle-ground where an agreement is setup between the model training company and the collective of developers/artists et all and come up with a license agreement where they are rewarded for their original work for perpetuity. A tiny % of the profits can be shared, which would be a form of UBI. This is fair not only because companies are using AI generated output but developers themselves are also paying and using AI generated output that is trained on other developer's input. I would feel good (in my conscience) that I am not "stealing" someone else's effort and they are being paid for it. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | foota 3 hours ago | parent | prev [-] | ||||||||||||||||||||||
I don't know how far it would get, but I imagine that a FAANG will be able to get the farthest here by virtue of having mountains of corporate data that they have complete ownership over. | |||||||||||||||||||||||
| |||||||||||||||||||||||