Remix.run Logo
quectophoton 7 months ago

Even ignoring the fact that programmatic access to translation seems to require payment, or that its parent company is doing the scraping (similar to how one would use CommonCrawl instead of doing the scraping themselves), I am actually in favor of taking in to account the intent behind it.

"Give and take", "equal exchange", however people want to put it. I don't mind if someone uses publicly-accessible content and ignores its copyright to make another thing, as long as their result is publicly-accessible and they're prepared to have their copyright ignored in return. If you not only use the result of someone else, but also their process, then be prepared to have your process publicly-accessible too, with its copyright ignored. And so on.

That's why I don't mind "unofficial" translations or subtitles (both copyright violations as soon as they are distributed) appearing on multiple sites. That's why I respect open-source licenses of projects that respect them. That's why I pay for some open-source software even if I don't have to. That's why I give credit to artists even when I use an image that I didn't make myself as profile picture (either from the internet or because I paid for it).

That's also why I don't mind anyone ignoring my copyright as long as it's on "equal" terms ("if you vendor my code and pass it off as yours, that's tacit approval for someone else doing the same thing to you" kind of thing ("someone else" because, at least for code, it won't be me)).

I only gave very specific examples, but I hope I was able to explain what I mean.

The thing that I don't like, is the highly asymmetrical situation we're in with generative AI: because the result (the trained model) is not publicly accessible like a significant part of the content it was trained on; they only release a very limited interface to it.

kyledrake 6 months ago | parent [-]

> Even ignoring the fact that programmatic access to translation seems to require payment, or that its parent company is doing the scraping (similar to how one would use CommonCrawl instead of doing the scraping themselves), I am actually in favor of taking in to account the intent behind it.

Does intent matter for the purposes of interpreting the laws here? I'm not criticizing your point, I'm genuinely curious if that matters (outside the context of fair use). I can certainly think of valid use cases that would not be considered fair use.

> The thing that I don't like, is the highly asymmetrical situation we're in with generative AI: because the result (the trained model) is not publicly accessible like a significant part of the content it was trained on; they only release a very limited interface to it.

I'm not sure that I agree with this one, given that most serious LLMs are free or very low cost to use, and in llama and phi-3's case pretty much just given away. Not a small gesture given the substantial expenses required to provide free access to some of these models.