Remix.run Logo
gg80 7 hours ago

Mine is anecdotal evidence at best: I co-authored a fairly obscure book about the application of category theory to an extremely niche subject. There's basically no mention of the stuff in the book anywhere on the internet, nor in any academic publication I'm aware of. If you want to have an idea about what's in the book you have to have access to it. I couldn't remember some details of it and being lazy and slightly curious I tried asking a couple of models (one by OpenAI and one by Google): they both managed to give me extremely detailed answers based on the contents of the book. Nobody has ever asked me or any other person involved in the publication for permission to use the book in any kind of training (they may have bought the book but not the rights to reproduce it).

The funny thing is what happened when I told one of the models (the Google one) I was one of the authors and that I had never given any consent to use the book for its training and that given that it was so willing to provide any user with the contents of the book nobody would have had any reason to buy the book. The thing told me that it had done it just because I was the author of the book (apparently me asking it about the content of an obscure academic book was sufficient to make it statistically plausible that I was one of the two people who had read the book, me and my co-author, excluding the editor a priori). It swore it would have never given that information to any other user.

I doubt that anyone could ever deny that LLMs are incredible tools that have incredible value. But denying that they have being made possible only thanks to egregious acts of piracy is disingenuous.