| ▲ | selridge 4 hours ago | |
This is speculation, but I suspect the training data argument is going to be a real loser in the courtroom. We’re getting out of the region where memorization is a big failure mode for frontier models. They are also increasingly trained on synthetic text, whose copyright is very difficult to determine. We also so far have yet to see anyone successfully sue over software copyright with LLMs—-this is a bit redundant, but we’ve also not seen a user of one of these models be sued for output. Maybe we converge on the view of the US copyright office which is that none of this can be protected. I kind of like that one as a future for software engineers, because it forces them all at long last to become rules lawyers. If we disallow all copyright protection for machine generated code, there might be a cottage industry of folks who provide a reliably human layer that is copyrightable. Like Boeing, they will have to write to the regulator and not to the spec. I feel that’s a suitable destination for a discipline. That’s had it too good for too long. | ||