| ▲ | jmyeet 5 hours ago | |||||||
There's a subtext in your point that I want to expand on. Tech people, particularly engineers, tend to make a fundamental error when dealing with the law that almost always causes them to make wrong conclusions. And that error is that they look for technical compliance when so much of the law is subjective and holistic. An example I like to use is people who do something illegal on the Internet and then use the argument "you can't prove I did it (with absolute certainty)". It could've been someone who hacked your Wifi. You don't know who on the Wifi did it, etc. But the law will look at the totality of the evidence. Did the activity occur when you were at home and stop when you weren't? How likely are alternative explanations? Etc. All of that will be considered based on some legal standard depending on the venue. In civil court that tends to be "the preponderance of the evidence" (meaning more likely than not) while in criminal court it's "beyond a reasonable doubt" (which is a much higher standard). So, using your example, an engineer will often fall into a trap of thinking they can substitute enough words to have a new original work, Ship of Theseus-like. And the law simply doesn't work that way. So, when this gets to a court (which it will, it's not a question of "if"), the court will consider how necessary the source work was to what you did. If you used it for a direct translation (eg from C++ to Go) then you're going to lose. My prediction is that even using it in training data will be cause for a copyright claim. If you use Moby Dick in your training data and ask an LLM to write a book like Moby Dick (either explicitly or implicitly) then you're going to have an issue. Even if you split responsibilities so one LLM (training on Moby Dick) comes up with a structure/prompt and another LLM (not trained on Moby Dick) writes it, I don't think that'll really help you avoid the issue. | ||||||||
| ▲ | sumtechguy 3 hours ago | parent | next [-] | |||||||
> So, when this gets to a court (which it will, it's not a question of "if"), the court will consider how necessary the source work was to what you did. If you used it for a direct translation (eg from C++ to Go) then you're going to lose. My prediction is that even using it in training data will be cause for a copyright claim. This has a lot of similarity to when colorization of film started popping up. Did colorizing black and white movies suddenly change the copyright of the film? At this point is seems mostly the courts say no. But you may find sometimes people rule the other way and say yes. But it takes time and a lot of effort to get what in general people want. But basically if you start with a 'spec' then make something you probably can get a wholly owned new thing. But if you start with the old thing and just transform it in some way. You can do that. But the original copyright holders still have rights too to the thing you mangled too. If I remember right they called it 'color of copyright' or something like that. The LLM bits you are probably right. But that has not been worked out by the law or the courts yet. So the courts may make up new case law around it. Or the lawmakers might get ahead of it and say something (unlikely). | ||||||||
| ||||||||
| ▲ | npongratz 3 hours ago | parent | prev [-] | |||||||
> And that error is that they look for technical compliance when so much of the law is subjective and holistic. I know it sounds like an oversimplification, but "got off on a technicality" is a common thing among the well-connected and well-heeled. Sure, us nerds probably focus too much on the "technicality" part, since we are by definition technical, but the rest is wishy-washy, unfair BS as far as many of our brains work much of the time. | ||||||||
| ||||||||