| ▲ | Anamon 2 days ago | |
> Cloudfare has to block so many more bots now precisely because crawling the public, free-to-everyone, internet is legally not theft. That is simply not true. Freely available on the web doesn't mean it's in the Public Domain. The "lawfully obtained" part of your argument is patently untrue. You can legally obtain something, but that doesn't mean any use of it is automatically legal as well. Otherwise, the recent Spotify dump by Anna's Archive would be legal as well. It all depends on the license the thing is released under, chosen by the person who made it freely accessible on the web. This license is still very emphatically a legally binding document that restricts what someone can do with it. For instance, since the advent of LLM crawling, I've added the "No Derivatives" clause to the CC license of anything new I publish to the web. It's still freely accessible, can be shared on, etc., but it explicitly prohibits using it for training ML models. I even add an additional clause to that effect, should the legal interpretation of CC-ND ever change. In short, anyone training an LLM on my content is infringing my rights, period. | ||
| ▲ | ben_w 2 days ago | parent [-] | |
> Freely available on the web doesn't mean it's in the Public Domain. Doesn't need to be. > The "lawfully obtained" part of your argument is patently untrue. You can legally obtain something, but that doesn't mean any use of it is automatically legal as well. I didn't say "any" use, I said this specific use. Here's the quote from the judge who decided this:
- https://storage.courtlistener.com/recap/gov.uscourts.cand.43...> Otherwise, the recent Spotify dump by Anna's Archive would be legal as well. I specifically said copyright infringement was separate. Because, guess what, so did the judge the next paragraph but one from the quote I just gave you. > For instance, since the advent of LLM crawling, I've added the "No Derivatives" clause to the CC license of anything new I publish to the web. It's still freely accessible, can be shared on, etc., but it explicitly prohibits using it for training ML models. I even add an additional clause to that effect, should the legal interpretation of CC-ND ever change. In short, anyone training an LLM on my content is infringing my rights, period. It will be interesting to see if that holds up in future court cases. I wouldn't bank on it if I was you. | ||