| ▲ | QuadmasterXLII 4 hours ago | |||||||||||||||||||||||||||||||
“Mr Teacher, how many words do I have to change after copy pasting wikipedia so its not plagiarism?” has grown up and entered the workforce. Pin your dependency versions people! With hashes at this point, cant trust anybody out here. | ||||||||||||||||||||||||||||||||
| ▲ | jmyeet 4 hours ago | parent [-] | |||||||||||||||||||||||||||||||
There's a subtext in your point that I want to expand on. Tech people, particularly engineers, tend to make a fundamental error when dealing with the law that almost always causes them to make wrong conclusions. And that error is that they look for technical compliance when so much of the law is subjective and holistic. An example I like to use is people who do something illegal on the Internet and then use the argument "you can't prove I did it (with absolute certainty)". It could've been someone who hacked your Wifi. You don't know who on the Wifi did it, etc. But the law will look at the totality of the evidence. Did the activity occur when you were at home and stop when you weren't? How likely are alternative explanations? Etc. All of that will be considered based on some legal standard depending on the venue. In civil court that tends to be "the preponderance of the evidence" (meaning more likely than not) while in criminal court it's "beyond a reasonable doubt" (which is a much higher standard). So, using your example, an engineer will often fall into a trap of thinking they can substitute enough words to have a new original work, Ship of Theseus-like. And the law simply doesn't work that way. So, when this gets to a court (which it will, it's not a question of "if"), the court will consider how necessary the source work was to what you did. If you used it for a direct translation (eg from C++ to Go) then you're going to lose. My prediction is that even using it in training data will be cause for a copyright claim. If you use Moby Dick in your training data and ask an LLM to write a book like Moby Dick (either explicitly or implicitly) then you're going to have an issue. Even if you split responsibilities so one LLM (training on Moby Dick) comes up with a structure/prompt and another LLM (not trained on Moby Dick) writes it, I don't think that'll really help you avoid the issue. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||