Remix.run Logo
dahart 8 hours ago

In the 2024 preface:

> Copyright holders worry about how to exercise control over the use of "their" creative material for training models; but that begs the question of whether copyright holders ever had, or should have, a right to any such control. If a human can read a book and learn from it, and then write their own books, why shouldn't a computer?

There’s a small amount of irony in an article that’s discussing copyright, and the invisible but critical context of information, then dismissing the context of copying when it comes to copyright, as well as confusing what copyright protects. I’m certain the author knows that copyright does not protect ideas, it does not protect “colour”, it deliberately only protects the “bits”. In US copyright law this is called the “fixation” of a work. The Berne Convention uses similar terminology: “works shall not be protected unless they have been fixed in some material form.”

AI’s “learning” has a different colour than human learning. This has been debated at length on HN and elsewhere, and in the courts, but it’s definitely wildly misleading to compare ChatGPT training on all books ever written and then being distributed (for a profit) to everyone, to one human reading one book and learning something from it.

aftbit 7 hours ago | parent [-]

More interesting to me is the "derivative work" concept. If a human sits down with a novel, reads it cover to cover, then writes their own novel which broadly has the same characters following the same plot in the same setting, but with slight differences in names and word choice, is that new work a derivative of the first for copyright purposes? What if they do the same thing for code? What if an AI does either or both of those?

IP courts will have some truly novel questions before them this century.

dahart 6 hours ago | parent [-]

Copyright does not cover ideas, period. If you write your own novel and use your own word choices, even if you copy the plot structure exactly and the same character names while writing a new book, it’s not even considered a derivative work under the law, it’s a new work. Copyright covers copying the fixed work itself. You aren’t in violation of copyright unless you copy the words themselves.

The flip side is that this is why the article’s discussion about randomness and monkeys on typewriters is irrelevant to copyright law. It’s a copyright violation to produce the same “fixation” no matter how you do it. If you generated a random sequence of characters, and it happened to match a NYT best selling book, you violate the book author’s copyrights, and claiming it was random isn’t a viable defense. Intent to copy can make it worse, but lack of intent does not absolve. There is precedent for people coming up independently with the same songs and one being successfully sued.

Do note that there are other laws that might cover plagiarism of ideas, trademarks, code, etc., copyright isn’t the only consideration, but copyright seems to be often misunderstood. We definitely have some novel questions because of the scale of AI’s copying, the nature of training and the provenance of the training data, and because of AI’s growing ability to skirt copyright law while actually copying.

zarzavat 6 hours ago | parent [-]

That's not really true. Copyright protects named characters if they are sufficiently distinctive, though there's nothing to stop you from creating an identical character with a different name.

dahart 4 hours ago | parent [-]

It is really true, but yes there are some specific exceptions. In general: “Copyright protection does not extend to names, titles, short phrases, ideas, methods, facts, or systems.” https://www.copyright.gov/engage/writers/

You’re right that in certain limited circumstances, copyright will protect fictional characters. To protect a character, the character must be “well delineated”, and this has proven to be a pretty high bar. https://en.wikipedia.org/wiki/Copyright_protection_for_ficti...