Remix.run Logo
friendzis 6 hours ago

> The GFDL imposes restrictions on distribution, not copying, so merely downloading a copy imposes no obligation on you and so isn't a copyright infringement either.

The restrictions fall not only on verbatim distribution, but derivative works too. I am not aware whether model outputs are settled to be or not to be (hehe) derivative works in a court of law, but that question is at the vey least very much valid.

mcherm 4 hours ago | parent [-]

It's the third sentence of the article:

> the district court ruled that using the books to train LLMs was fair use but left for trial the question of whether downloading them for this purpose was legal.

friendzis 4 hours ago | parent [-]

No, those are separate issues.

The pipeline is something like: download material -> store material -> train models on material -> store models trained on material -> serve output generated from models.

These questions focus on the inputs to the model training, the question I have raised focuses on the outputs of the model. If [certain] outputs are considered derivative works of input material, then we have a cascade of questions which parts of the pipeline are covered by the license requirements. Even if any of the upstream parts of this simplified pipeline are considered legal, it does not imply that that the rest of the pipeline is compliant.

superxpro12 2 hours ago | parent [-]

Consider the net effect and the answer is clear. When these models are properly "trained", are people going to look for the book or a derivative of it, with proper attribution?

Or is the LLM going to regurgitate the same content with zero attribution, and shift all the traffic away from the original work?

When viewed in this frame, it is obvious that the work is derivative and then some.

limagnolia 23 minutes ago | parent [-]

That is your opinion, but the judge disagreed with you. The decision may have been overturned on appeal, but as it stands, in that courtroom, the training was fair use.

seba_dos1 a minute ago | parent | next [-]

[delayed]

integralid 3 minutes ago | parent | prev [-]

This is also, unfortunately, the only way this can be settled. Making LLM output legally a derivative work would murder the AI golden rush and nobody wants that