▲	dijksterhuis 7 months ago
		i’ve been avoiding replying to your comment for a bit, and now i realised why. edit: i am so sorry about the wall of text. > some are also claiming that the output of a LLM should result in paying royalties back to the owning of the material used in training. > So if an AI produces a sitcom script then the copyright holders of those tv shows it ingested should get paid royalties. In additional to the money paid to copy files around. what you’re talking about here is the concept of “derivative works” made from other, source works. this is subtly different to reproduction of a work. see the last half of this comment for my thoughts on what the interesting thing courts need to work out regarding verbatim reproduction https://news.ycombinator.com/item?id=42282003 in the derivative works case, it’s slightly different. sampling in music is the best example i’ve got for this. if i take four popular songs, cut 10 seconds of each, and then join each of the bits together to create a new track — that is a new, derivative work. but i have not sufficiently modified the source works. they are clearly recognisable. i am just using copyrighted material in a really obvious way. the core of my “new” work is actually just four reproductions of the work of other people. in that case — that derivative work, under music copyright law, requires the original copyright rights holders to be paid for all usage and copying of their works. basically, a royalty split gets agreed, or there’s a court case. and then there’s a royalty split anyway (probably some damages too). in my case, when i make music with samples, i make sure i mangle and process those samples until the source work is no longer recognisable. i’ve legit made it part of my workflow. it’s no longer the original copyrighted work. it’s something completely new and fully unrecognisable. the issue with LLMs, not just ChatGpt, is that they will reproduce both verbatim and recognisably similar output to original source works. the original source copyrighted work is clearly recognisable, even if not an exact verbatim copy. and that’s what you’ve probably seen folks talking about, at least it sounds like it to me. > Which leads to the precedent that if a writer creates a sitcom then the copyright holders of sitcoms she watched should get paid for "training" her. robin thicke “blurred lines” — * https://en.m.wikipedia.org/wiki/Pharrell_Williams_v._Bridgep... * https://en.m.wikipedia.org/wiki/Blurred_Lines (scroll down) yes, there is already some very limited precedent, at least for a narrow specific case involving sheet music in the US. the TL;DR IANAL version of the question at hand in the case was “did the defendants write the song with the intention of replicating a hook from the plaintiff’s work”. the jury decided, yes they did. this is different to your example in that they specifically went out to replicate the that specific musical component of a song. in your example, you’re talking about someone having “watched” a thing one time and then having to pay royalties to those people as a result. that’s more akin to “being inspired” by, and is protected under US law i think IANAL. it came up in blurred lines, but, well, yeah. https://en.m.wikipedia.org/wiki/Idea%E2%80%93expression_dist... again, the red line of infringement / not infringement is ultimately up to the courts to rule on. — anyway, this is very different to what openAi/chatGpt is doing. openAi takes the works. chatgpt edits them according to user requests (feed forward through the model). then the output is distributed to the user. and that output could be considered to be a derivative work (see massive amount of text i wrote above, i’m sorry). LLMs aren’t sitting there going “i feel like recreating a marvin gaye song”. it takes data, encodes/decodes it, then produces an output. it is a mechanical process, not a creative one. there’s no ideas here. no inspiration or expression. an LLM is not a human being. it is a tool, which creates outputs that are often strikingly similar to source copyrighted works. their users might be specifically asking to replicate songs though. in which case, openAi could be facilitating copyright infringement (wether through derivative works or not). and that’s an interesting legal question by itself. are they facilitating the production of derivative works through the copying of copyrighted source works? i would say they are. and, in some cases, the derivative works are obviously derived.