heads up: you may want to edit your second quote
—
> if a user is allowed to download said copy to view on their browser, why isn't that same right given to openAI to download a copy to view for them?
whether you can download a copy from your browser doesn’t matter. whether the work is registered as copyrighted does (and following on from that, who is distributing the work - aka allowing you to download the copy - and for what purposes).
from the article (on phone cba to grab a quote) it makes clear that the Intercept’s works were not registered as copyrighted works with whatever the name of the US copyright office was.
ergo, those works are not copyrighted and, yes, they essentially are public domain and no remuneration is required …
(they cannot remove DMCA attribution information when distributing copies of the works though, which is what the case is now about.)
but for all the other registered works that OpenAI has downloaded, creating their copy, used in training data, which the model then reproduces as a memorised copy — that is copyright infringement.
like, in case it’s not clear, i’ve been responding to what people are saying about copyright specifically. not this specific case.
> The information distilled from those works do not constitute any copyrightable information, as it is not literary, but informational.
that’s one argument.
my argument would be it is a form of compression/decompression when the model weights result in memorised (read: overfitted) training data being regurgitated verbatim.
put the specific prompt in, you get the decompressed copy out the other end.
it’s like a zip file you download with a new album of music. except, in this case, instead of double clicking on the file you have to type in a prompt to get the decompressed audio files (or text in LLM case)
> It's irrelevant that you could recover the original works from these weights - you could recover the same original works from the digits of pi!
actually, that’s the whole point of courts ruling on this.
the boundaries of what is considered reproduction is at question. it is up to the courts to decide on the red lines (probably blurry gray areas for a while).
if i specifically ask a model to reproduce an exact song… is that different to the model doing it accidentally?
i don’t think so. but a court might see it differently.
as someone who worked in music copyright, is a musician, sees the effects of people
stealing musicians efforts all the time, i hope the little guys come out of this on top.
sadly, they usually don’t.