| ▲ | ben_w 9 hours ago | |
Everyone who studies linguistics will tell you the rules of language are descriptive not proscriptive. This means that people saying "plagiarism" of an LLM, means that LLMs are necessarily in the set of things that can do plagiarism, regardless of if those same people would ever say this about a spanner. And you can also think about it a different way: a book is a tool for storing and distributing information, photocopying it is still plagiarism when done without attribution. Likewise, taking the output of an LLM, which is a tool for generating text in response to a prompt, without attribution, is as much plagiarism as if it came from a book. IMO, what matters most is that a lot of people want to be aware of if/when some content came from an LLM vs. from a human. That makes attribution useful, which makes it important to get right. And that's still the case even if you still object to the specific word "plagiarism". | ||
| ▲ | probably_wrong 8 hours ago | parent [-] | |
I don't think your example works because in the book case there's a clear author whose ideas are being reproduced without permission. The LLM in your example is not the author but rather the printing press, and no one would argue that the printing press' ideas are being stolen because the press doesn't have any. If one want to argue that "not citing the LLM would be plagiarism" then we would have to find the human at the end of the chain whose ideas are being reproduced, which would require LLMs to output "this idea was seen in the following training documents". | ||