Remix.run Logo
CaptainFever 7 months ago

I won't necessarily argue against that moral view, but in this case it is two large corporations fighting. One has the power of tech, the other has the power of the state (copyright). So I don't think that applies in this case specifically.

Xelynega 7 months ago | parent [-]

Aren't you ignoring that common law is built on precedent? If they win this case, that makes it a lot easier for people who's copyright is being infringed on an individual level to get justice.

CaptainFever 7 months ago | parent [-]

You're correct, but I think many don't realize how many small model trainers and fine-tuners there are currently. For example, PonyXL, or the many models and fine-tunes on CivitAI made by hobbyists.

So basically the reasoning is this:

- NYT vs OpenAI, neither is disenfranchied - OpenAI vs individual creators, creators are disenfranchised - NYT vs individual model trainers, model trainers are disenfranchised - Individual model trainers vs individual creators, neither are disenfranchised

And if only one can win, and since the view is that information should be free, it biases the argument towards the model trainers.

AlienRobot 7 months ago | parent [-]

What "information" are you talking about? It's a text and image generator.

Your argument is that it's okay to scrape content when you are an individual. It doesn't change the fact those individuals are people with technical expertise using it to exploit people without.

If they wrote a bot to annoy people but published how many people got angry about it, would you say it's okay because that is information?

You need to draw the line somewhere.

CaptainFever 7 months ago | parent [-]

Text and images are information, though.

> If they wrote a bot to annoy people but published how many people got angry about it, would you say it's okay because that is information?

Kind of? It's not okay, but not because it is usage of information without consent (this is the "information should free" part), but because it is intentionally and unnecessarily annoying and angering people (this is the "don't use the information for evil" part which I think is your position).

"See? Similarly, even in your view, model trainers aren't bad because they're using data. They're bad in general because they're exploiting creatives."

But why is it exploitative?

"They're putting the creatives out of a job." But this applies to automation in general.

"They're putting creatives out of a job, using data they created." This is the strongest argument for me. It does intuitively feel exploitative. However, there are several issues:

1. Not all models or datasets do that. For instance, no one is visibly getting paid to write comments on HN, or to write fanfics on the non-commercial fanfic site AO3. Since the data creators are not doing it as a job in the first place, it does not make sense to talk about them losing their job because of the very same data.

2. Not all models or datasets do that. For example, spam filters, AI classifiers. All of this can be trained from the entire Internet and not be exploitative because there is no job replacement involved here.

3. Some models already do that, and are already well and morally accepted. For example, Google Translate.

4. This may be resolved by going the other way and making more models open source (or even leaks), so more creatives can use it freely, so they can make use of the productive power.

"Because they're using creatives' information without consent." But as mentioned, it's not about the information or consent. It's about what you do with the information.

Finally, because this is a legal case, it's also important to talk about the morality of using the state to restrict people from using information freely, even if their use of the information is morally wrong.

If you believe in free culture as in free speech, then it is wrong to restrict such a use using the law, even though we might agree it is morally wrong. But this really depends if you believe in free culture as in free speech in the first place, which is a debate much larger than this.