Remix.run Logo
chromacity 3 hours ago

Conversely, some people feel entitled to 100% of the value created by others. Oh, you wrote a book? Too bad, it's a part of my training data set now.

Downloading public stuff off the internet with no regard for the creator's wishes or license is bad enough, but we have many people here who defended AI companies seeding models with pirated content.

The internet is a social contract. AI is not the first thing to try and erode it for profit, but it's by far the most aggressive one.

pc86 an hour ago | parent [-]

Putting a book into a training data set does not take 100% of the value created by the author. You could make a convincing argument that since the LLM was never going to purchase the book, and the number of people who would have purchased the book but now won't because it's included in the training data is effectively zero, that no value was lost at all.

Licenses are legal documents and are usually treated as such, but "the creator's wishes" are irrelevant without case law, legislation, or licensing to back it up. And jurisdiction - show me a license that doesn't stand up in court in my home jurisdiction and I'll show you a license I won't care if I break or not.