| ▲ | DexesTTP an hour ago | |
The latter. It's a tool, if using data is necessary to make the tool work, then its output derives from the data. If the LLM generation is not derivative of its training data, then why would it need the training data in the first place? | ||
| ▲ | Gormo an hour ago | parent [-] | |
> It's a tool, if using data is necessary to make the tool work, then its output derives from the data. That's simply not correct within the applicable meaning of "derives" as understood in copyright law. In fact, data per se is not even within the scope of copyright protection in the first place: specific published works are copyrighted, but the underlying ideas and facts that they convey are not. Even creating works that merely draw on a single source of data, but express the ideas drawn from that in a new or transformative way, are not considered derivative works (see the ruling in Google v. Oracle, for example), let alone works based on patterns extrapolated by relating together ideas sourced from many distinct works, which is what LLMs are principally doing. If you applied the principle you're proposing here to human developers, you'd conclude that any code written by someone who learned to program by studying techniques used in FOSS software would in turn be a derivative work of that software. No one has ever regarded this to be the case. | ||