Remix.run Logo
dijksterhuis 7 months ago

this isn’t the same.

> If you copied an art piece using photoshop, you would've violated copyright. Photoshop (and adobe) itself never committed copyright violations.

the COPYing is happening on your local machine with non-cloud versions of Photoshop.

you are making a copy, using a tool, and then distributing that copy.

in music royalty terms, the making a copy is the Mechanical right, while distributing the copy is the Performing right.

and you are liable in this case.

> Somehow, if you swap photoshop with openAI and chatGPT, then people claim that the actual application itself is a copyright violation

OpenAI make a copy of the original works to create training data.

when the original works are reproduced verbatim (memorisation in LLMs is a thing), then that is the copyrighted work being distributed.

mechanical and performing rights, again.

but the twist is that ChatGPT does the copying on their servers and delivers it to your device.

they are creating a new copy and distributing that copy.

which makes them liable.

you are right that “ChatGPT” is just a tool.

however, the interesting legal grey area with this is — are ChatGPT model weights an encoded copy of the copyrighted works?

that’s where the conversation about the tool itself being a copyright violation comes in.

photoshop provides no mechanism to recite The Art Of War out of the box. an LLM could be trained to do so (like, it’s a hypothetical example but hopefully you get the point).

chii 7 months ago | parent [-]

> OpenAI make a copy of the original works to create training data.

if a user is allowed to download said copy to view on their browser, why isn't that same right given to openAI to download a copy to view for them? What openAI chooses to do with the viewed information is up to them - such as distilling summary statistics, or whatever.

> are ChatGPT model weights an encoded copy of the copyrighted works? that is indeed the most interesting legal gray area. I personally believe that it is not. The information distilled from those works do not constitute any copyrightable information, as it is not literary, but informational.

It's irrelevant that you could recover the original works from these weights - you could recover the same original works from the digits of pi!

dijksterhuis 7 months ago | parent [-]

heads up: you may want to edit your second quote

> if a user is allowed to download said copy to view on their browser, why isn't that same right given to openAI to download a copy to view for them?

whether you can download a copy from your browser doesn’t matter. whether the work is registered as copyrighted does (and following on from that, who is distributing the work - aka allowing you to download the copy - and for what purposes).

from the article (on phone cba to grab a quote) it makes clear that the Intercept’s works were not registered as copyrighted works with whatever the name of the US copyright office was.

ergo, those works are not copyrighted and, yes, they essentially are public domain and no remuneration is required …

(they cannot remove DMCA attribution information when distributing copies of the works though, which is what the case is now about.)

but for all the other registered works that OpenAI has downloaded, creating their copy, used in training data, which the model then reproduces as a memorised copy — that is copyright infringement.

like, in case it’s not clear, i’ve been responding to what people are saying about copyright specifically. not this specific case.

> The information distilled from those works do not constitute any copyrightable information, as it is not literary, but informational.

that’s one argument.

my argument would be it is a form of compression/decompression when the model weights result in memorised (read: overfitted) training data being regurgitated verbatim.

put the specific prompt in, you get the decompressed copy out the other end.

it’s like a zip file you download with a new album of music. except, in this case, instead of double clicking on the file you have to type in a prompt to get the decompressed audio files (or text in LLM case)

> It's irrelevant that you could recover the original works from these weights - you could recover the same original works from the digits of pi!

actually, that’s the whole point of courts ruling on this.

the boundaries of what is considered reproduction is at question. it is up to the courts to decide on the red lines (probably blurry gray areas for a while).

if i specifically ask a model to reproduce an exact song… is that different to the model doing it accidentally?

i don’t think so. but a court might see it differently.

as someone who worked in music copyright, is a musician, sees the effects of people stealing musicians efforts all the time, i hope the little guys come out of this on top.

sadly, they usually don’t.