| ▲ | martin-t 3 days ago | ||||||||||||||||||||||||||||||||||||||||||||||
And this is just the stuff you notice. 1) Verbatin copy is first-order plagiarism. 2a) Second-order plagiarism of written text would be replacing words with synonyms. Or taking a book paragraph by paragraph and for each one of them, rephrasing it in your own words. Yes, it might fool automated checkers but the structure would still be a copy of the original book. And most importantly, it would not contain any new information. No new positive-sum work was done. It would have no additional value. Before LLMs almost nobody did this because the chance that it would help in a lawsuit vs the amount of work was not a good tradeoff. Now it is. But LLMs can do "better": 2b) A different kind of second-order plagiarism is using multiple sources and plagiarizing each of them only in part. Find multiple books on the same topic, take 1 chapter from each and order them in a coherent manner. Make it more granular. Find paragraphs or phrases which fit into the structure of your new book but are verbatim from other books. See how granular you can make it. The trick here is that doing this by hand is more work than just writing your own book. So nobody did it and copyright law does not really address this well. But with LLMs, it can be automated. You can literally instruct an LLM to do this and it will do it cheaper than any human could. However, how LLMs work internally is yet different: n) Higher-order plagiarism is taking multiple source books, identifying patterns, and then reproducing them in your "new" book. If the patterns are sufficiently complex, nobody will ever be able to prove what specifically you did. What previously took creative human work now became a mechanical transformation of input data. The point is this ability to detect and reproduce patterns is an impressive innovation but it's built on top of the work of hundreds of millions[0] of humans whose work was used without consent. The work done by those employed by the LLM companies is minuscule compared to that. Yet all of the reward goes to them. Not to mention LLMs completely defear the purpose of (A)GPL. If you can take AGPL code and pass it through a sufficiently complex mechanical transformation that the output does the same thing but copyright no longer applies, then free software is dead. No more freedom to inspect and modify. [0]: Github alone has 100 million users ( https://expandedramblings.com/index.php/github-statistics/ ) and we have reason to believe all of their data was used in training. | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jacquesm 3 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
If a human did 2a or 2b we would think that a larger infraction than (1) because it shows intent to obfuscate the origins. As for your free software is dead argument: I think it is worse than that: it takes away the one payment that free software authors get: recognition. If a commercial entity can take the code, obfuscate it and pass it off as their own copyrighted work to then embrace and extend it then that is the worst possible outcome. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | fc417fc802 2 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
You make several good points, and I appreciate that they appear well thought out. > What previously took creative human work now became a mechanical transformation of input data. At which point I find myself wondering if there's actually a problem. If it was previously permitted due to the presence of creative input, why should automating that process change the legal status? What justifies treating human output differently? > then free software is dead. No more freedom to inspect and modify. It seems to me that depends on the ideological framing. Consider a (still entirely hypothetical) world where anyone can receive approximately any software they wish with little more than a Q&A session with an expert AI agent. Rather than free software being dead, such a scenario would appear to obviate the vast majority of needs that free software sets out to serve in the first place. It seems a bit like worrying that free access to a comprehensive public transportation service would kill off a ride sharing service. It probably would, and the end result would also probably be a net benefit to humanity. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||