▲ | zer00eyz 7 days ago | ||||||||||||||||||||||||||||
> It’s important in the fair use assessment to understand that the training itself is fair use, I think that this is a distinction many people miss. If you take all the works of Shakespeare, and reduce it to tokens and vectors is it Shakespeare or is it factual information about Shakespeare? It is the latter, and as much as organizations like the MLB might want to be able to copyright a fact you simply cannot do that. Take this one step further. IF you buy the work, and vectorize it, thats fine. But if you feed it in the vectors for Harry Potter so many times that it can reproduce half of the book, it becomes a problem when it spits out that copy. And what about all the other stuff that LLM's spit out? Who owns that. Well at present, no one. If you train a monkey or an elephant to paint, you cant copyright that work because they aren't human, and neither is an LLM. If you use an LLM to generate your code at work, can you leave with that code when you quit? Does GPL3 or something like the Elastic Search license even apply if there is no copyright? I suspect we're going to be talking about court cases a lot for the next few years. | |||||||||||||||||||||||||||||
▲ | Imustaskforhelp 7 days ago | parent | next [-] | ||||||||||||||||||||||||||||
Yes. Someone on this post mentioned that switzerland allows downloading copyrightable material but not distributing them. So things get even more dark because what becomes distribution can have a really vague definition and maybe the AI companies will only follow the law just barely, just for the sake of not getting hit with a lawsuit like this again. But I wonder if all this case did was maybe compensate the authors this one time. I doubt if we can see a meaningful change towards AI companies attitude's towards fair use/ essentially exploiting authors. I feel like that they would try to use as much legalspeak as possible to extract as much from authors (legally) without compensating them which I feel is unethical but sadly the law doesn't work on ethics. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | arcticfox 7 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
> And what about all the other stuff that LLM's spit out? Who owns that. Well at present, no one. If you train a monkey or an elephant to paint, you cant copyright that work because they aren't human, and neither is an LLM. This seems too cute by half, courts are generally far more common sense than that in applying the law. This is like saying using `rails generate model:example` results in a bunch of code that isn't yours, because the tool generated it according to your specifications. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | simoncion 7 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
> If you take all the works of Shakespeare, and reduce it to tokens and vectors is it Shakespeare or is it factual information about Shakespeare? To rephrase the question: Is a PDF of the complete works of Shakespeare Shakespeare, or is it factual information about Shakespeare? Reencoding human-readable information into a form that's difficult for humans to read without machine assistance is nothing new. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | zmmmmm 7 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
The question is going to be how much human intellectual input there was I think. I don't think it will take much - you can write the crappiest novel on earth that is complete random drivel and you still have copyright on it. So to me, if you are doing literally any human review, edits, control over the AI then I think you'll retain copyright. There may be a risk that if somebody can show that they could produce exactly the same thing from a generic prompt with no interaction then you may be in trouble, but let's face it should you have copyright at that point? This is, however, why I favor stopping slightly short of full agentic development at this point. I want the human watching each step and an audit trail of the human interaction in doing it. Sure I might only get to 5x development speed instead of 10x or 20x but that is already such an enormous step up from where we were a year ago that I am quite OK with that for now. | |||||||||||||||||||||||||||||
▲ | tomrod 7 days ago | parent | prev [-] | ||||||||||||||||||||||||||||
I mean, sort of. The issue is that the compression is novel. So anything post tokenization could arguably be considered value add and not necessarily derivative work. |