| ▲ | fweimer 8 hours ago | |||||||
I saw weird results with Gemini 2.5 Pro when I asked it to provide concrete source code examples matching certain criteria, and to quote the source code it found verbatim. It said it in its response quoted the sources verbatim, but that wasn't true at all—they had been rewritten, still in the style of the project it was quoting from, but otherwise quite different, and without a match in the Git history. It looked a bit like someone at Google subscribed to a legal theory under which you can avoid copyright infringement if you take a derivative work and apply a mechanical obfuscation to it. | ||||||||
| ▲ | Workaccount2 6 hours ago | parent [-] | |||||||
LLM's are not archives of information. People seem to have this belief, or perhaps just general intuition, that LLMs are a google search on a training set with a fancy language engine on the front end. That's not what they are. The models (almost) self avoid copyright, because they never copy anything in the first place, hence why the model is a dense web of weight connections rather than an orderly bookshelf of copied training data. Picture yourself contorting your hands under a spotlight to generate a shadow in the shape of a bird. The bird is not in your fingers, despite the shadow of the bird, and the shadow of your hand, looking very similar. Furthermore, your hand-shadow has no idea what a bird is. | ||||||||
| ||||||||