Remix.run Logo
dijksterhuis 7 months ago

> Should I be paying a proportion of my salary to all the copyright holders of the books, song, TV shows and movies I consumed during my life?

you already are.

a proportion of what you pay for books, music, tv shows, movies goes to rights holders already.

any subscription to spotify/apple music/netflix/hbo; any book/LP/CD/DVD/VHS; any purchased digital download … a portion of that sales is paid back to rights holders.

so… i’m not entirely sure what your comment is trying to argue for.

are you arguing that you should get paid a rebate for your salary that’s already been spent on copyright payments to rights holders?

> If a Hollywood writer says she "learnt a lot about writing by watching the Simpsons" will Fox have an additional claim on her earnings?

no. that’s not how copyright functions.

the actual episodes of the simpsons are the copyrighted work.

broadcasting/allowing purchases of those episode incurs the copyright as it involves COPYING the material itself.

COPYright is about the rights of the rights holder when their work is COPIED, where a “work” is the material which the copyright applies to.

merely mentioning the existence of a tv show involves zero copying of a registered work.

being inspired by another TV show to go off and write your own tv show involves zero copying of the work.

a hollywood writer rebroadcasting a simpsons during a TV interview would be a different matter. same with the hollywood writer just taking scenes from a simpsons episode and putting it into their film. that’s COPYing the material.

—-

when it comes to open AI, obviously this is a legal gray area until courts start ruling.

but the accusations are that OpenAi COPIED the intercept’s works by downloading them.

openAi transferred the work to openAi servers. they made a copy. and now openAi are profiting from that copy of the work that they took, without any permission or remuneration for the rights holder of the copyrighted work.

essentially, openAI did what you’re claiming is the status quo for you… but it’s not the status quo for you.

so yeah, your comment confuses me. hopefully you’re being sarcastic and it’s just gone completely over my head.

slyall 7 months ago | parent | next [-]

The problem is the anti-AI people who complain about AI are going for several steps in the chain (and often they are vague about which ones they are talking about at any point).

As well as the "copying" of content some are also claiming that the output of a LLM should result in paying royalties back to the owning of the material used in training.

So if an AI produces a sitcom script then the copyright holders of those tv shows it ingested should get paid royalties. In additional to the money paid to copy files around.

Which leads to the precedent that if a writer creates a sitcom then the copyright holders of sitcoms she watched should get paid for "training" her.

jashmatthews 7 months ago | parent | next [-]

When humans learn and copy too closely we call that plagiarism. If an LLM does it how should we deal with that?

chii 7 months ago | parent [-]

> If an LLM does it how should we deal with that?

why not deal with it the same way as humans have been dealt with in the past?

If you copied an art piece using photoshop, you would've violated copyright. Photoshop (and adobe) itself never committed copyright violations.

Somehow, if you swap photoshop with openAI and chatGPT, then people claim that the actual application itself is a copyright violation.

dijksterhuis 7 months ago | parent [-]

this isn’t the same.

> If you copied an art piece using photoshop, you would've violated copyright. Photoshop (and adobe) itself never committed copyright violations.

the COPYing is happening on your local machine with non-cloud versions of Photoshop.

you are making a copy, using a tool, and then distributing that copy.

in music royalty terms, the making a copy is the Mechanical right, while distributing the copy is the Performing right.

and you are liable in this case.

> Somehow, if you swap photoshop with openAI and chatGPT, then people claim that the actual application itself is a copyright violation

OpenAI make a copy of the original works to create training data.

when the original works are reproduced verbatim (memorisation in LLMs is a thing), then that is the copyrighted work being distributed.

mechanical and performing rights, again.

but the twist is that ChatGPT does the copying on their servers and delivers it to your device.

they are creating a new copy and distributing that copy.

which makes them liable.

you are right that “ChatGPT” is just a tool.

however, the interesting legal grey area with this is — are ChatGPT model weights an encoded copy of the copyrighted works?

that’s where the conversation about the tool itself being a copyright violation comes in.

photoshop provides no mechanism to recite The Art Of War out of the box. an LLM could be trained to do so (like, it’s a hypothetical example but hopefully you get the point).

chii 7 months ago | parent [-]

> OpenAI make a copy of the original works to create training data.

if a user is allowed to download said copy to view on their browser, why isn't that same right given to openAI to download a copy to view for them? What openAI chooses to do with the viewed information is up to them - such as distilling summary statistics, or whatever.

> are ChatGPT model weights an encoded copy of the copyrighted works? that is indeed the most interesting legal gray area. I personally believe that it is not. The information distilled from those works do not constitute any copyrightable information, as it is not literary, but informational.

It's irrelevant that you could recover the original works from these weights - you could recover the same original works from the digits of pi!

dijksterhuis 7 months ago | parent [-]

heads up: you may want to edit your second quote

> if a user is allowed to download said copy to view on their browser, why isn't that same right given to openAI to download a copy to view for them?

whether you can download a copy from your browser doesn’t matter. whether the work is registered as copyrighted does (and following on from that, who is distributing the work - aka allowing you to download the copy - and for what purposes).

from the article (on phone cba to grab a quote) it makes clear that the Intercept’s works were not registered as copyrighted works with whatever the name of the US copyright office was.

ergo, those works are not copyrighted and, yes, they essentially are public domain and no remuneration is required …

(they cannot remove DMCA attribution information when distributing copies of the works though, which is what the case is now about.)

but for all the other registered works that OpenAI has downloaded, creating their copy, used in training data, which the model then reproduces as a memorised copy — that is copyright infringement.

like, in case it’s not clear, i’ve been responding to what people are saying about copyright specifically. not this specific case.

> The information distilled from those works do not constitute any copyrightable information, as it is not literary, but informational.

that’s one argument.

my argument would be it is a form of compression/decompression when the model weights result in memorised (read: overfitted) training data being regurgitated verbatim.

put the specific prompt in, you get the decompressed copy out the other end.

it’s like a zip file you download with a new album of music. except, in this case, instead of double clicking on the file you have to type in a prompt to get the decompressed audio files (or text in LLM case)

> It's irrelevant that you could recover the original works from these weights - you could recover the same original works from the digits of pi!

actually, that’s the whole point of courts ruling on this.

the boundaries of what is considered reproduction is at question. it is up to the courts to decide on the red lines (probably blurry gray areas for a while).

if i specifically ask a model to reproduce an exact song… is that different to the model doing it accidentally?

i don’t think so. but a court might see it differently.

as someone who worked in music copyright, is a musician, sees the effects of people stealing musicians efforts all the time, i hope the little guys come out of this on top.

sadly, they usually don’t.

dijksterhuis 7 months ago | parent | prev [-]

i’ve been avoiding replying to your comment for a bit, and now i realised why.

edit: i am so sorry about the wall of text.

> some are also claiming that the output of a LLM should result in paying royalties back to the owning of the material used in training.

> So if an AI produces a sitcom script then the copyright holders of those tv shows it ingested should get paid royalties. In additional to the money paid to copy files around.

what you’re talking about here is the concept of “derivative works” made from other, source works.

this is subtly different to reproduction of a work.

see the last half of this comment for my thoughts on what the interesting thing courts need to work out regarding verbatim reproduction https://news.ycombinator.com/item?id=42282003

in the derivative works case, it’s slightly different.

sampling in music is the best example i’ve got for this.

if i take four popular songs, cut 10 seconds of each, and then join each of the bits together to create a new track — that is a new, derivative work.

but i have not sufficiently modified the source works. they are clearly recognisable. i am just using copyrighted material in a really obvious way. the core of my “new” work is actually just four reproductions of the work of other people.

in that case — that derivative work, under music copyright law, requires the original copyright rights holders to be paid for all usage and copying of their works.

basically, a royalty split gets agreed, or there’s a court case. and then there’s a royalty split anyway (probably some damages too).

in my case, when i make music with samples, i make sure i mangle and process those samples until the source work is no longer recognisable. i’ve legit made it part of my workflow.

it’s no longer the original copyrighted work. it’s something completely new and fully unrecognisable.

the issue with LLMs, not just ChatGpt, is that they will reproduce both verbatim and recognisably similar output to original source works.

the original source copyrighted work is clearly recognisable, even if not an exact verbatim copy.

and that’s what you’ve probably seen folks talking about, at least it sounds like it to me.

> Which leads to the precedent that if a writer creates a sitcom then the copyright holders of sitcoms she watched should get paid for "training" her.

robin thicke “blurred lines” —

* https://en.m.wikipedia.org/wiki/Pharrell_Williams_v._Bridgep...

* https://en.m.wikipedia.org/wiki/Blurred_Lines (scroll down)

yes, there is already some very limited precedent, at least for a narrow specific case involving sheet music in the US.

the TL;DR IANAL version of the question at hand in the case was “did the defendants write the song with the intention of replicating a hook from the plaintiff’s work”.

the jury decided, yes they did.

this is different to your example in that they specifically went out to replicate the that specific musical component of a song.

in your example, you’re talking about someone having “watched” a thing one time and then having to pay royalties to those people as a result.

that’s more akin to “being inspired” by, and is protected under US law i think IANAL. it came up in blurred lines, but, well, yeah. https://en.m.wikipedia.org/wiki/Idea%E2%80%93expression_dist...

again, the red line of infringement / not infringement is ultimately up to the courts to rule on.

anyway, this is very different to what openAi/chatGpt is doing.

openAi takes the works. chatgpt edits them according to user requests (feed forward through the model). then the output is distributed to the user. and that output could be considered to be a derivative work (see massive amount of text i wrote above, i’m sorry).

LLMs aren’t sitting there going “i feel like recreating a marvin gaye song”. it takes data, encodes/decodes it, then produces an output. it is a mechanical process, not a creative one. there’s no ideas here. no inspiration or expression.

an LLM is not a human being. it is a tool, which creates outputs that are often strikingly similar to source copyrighted works.

their users might be specifically asking to replicate songs though. in which case, openAi could be facilitating copyright infringement (wether through derivative works or not).

and that’s an interesting legal question by itself. are they facilitating the production of derivative works through the copying of copyrighted source works?

i would say they are. and, in some cases, the derivative works are obviously derived.

Suppafly 7 months ago | parent | prev [-]

>a proportion of what you pay for books, music, tv shows, movies goes to rights holders already.

When I borrow a book from a friend, how do the original authors get paid for that?

dijksterhuis 7 months ago | parent [-]

they don’t.

borrowing a book is not creating a COPY of the book. you are not taking the pages, reproducing all of the text on those pages, and then giving that reproduction to your friend.

that is what a COPY is. borrowing the book is not a COPY. you’re just giving them the thing you already bought. it is a transfer of ownership, albeit temporarily, not a copy.

if you were copying the files from a digitally downloaded album of music and giving those new copies to your friend (music royalties were my specialty) then technically you would be in breach of copyright. you have copied the works.

but because it’s such a small scale (an individual with another individual) it’s not going to be financially worth it to take the case to court.

so copyright holders just cut their losses with one friend sharing it with another friend, and focus on other infringements instead.

which is where the whole torrenting thing comes in. if i can track 7000 people who have all downloaded the same torrented album, now i can just send a letter / court date to those 7000 people.

the costs of enforcement are reduced because of scale. 7000 people, all found the same thing, in a way that can be tracked.

and the ultimate, one person/company has download the works and making them available to others to download, without paying for the rights to make copies when distributing.

that’s the ultimate goldmine for copyright infringement lawsuits. and it sounds suspiciously like openAi’s business model.

Suppafly 7 months ago | parent [-]

>borrowing a book is not creating a COPY of the book. you are not taking the pages, reproducing all of the text on those pages, and then giving that reproduction to your friend.

That's not what's happening with training AI models either though.

dijksterhuis 7 months ago | parent [-]

check out my other comment in this thread about derivative works.

https://news.ycombinator.com/item?id=42282443

OpenAI are taking copies of people’s data. some of that is copyrighted data.

that’s copyright infringement.

an LLM is a tool to create derivative works from the data OpenAI has copied without permission (when considering only copyrighted works and nothing public domain).

derivative works can also be considered copyright infringement in some cases.

how the tool functions is irrelevant for the most part. how copy right infringement occurs doesn’t matter. only that it does.