| ▲ | godelski 2 days ago | |||||||
There's confusing terminology here and without clarification people talk past one another."What its been trained on" is a distribution. It can produce things from that distribution and only things from that distribution. If you train on multiple distributions, you get the union of the distribution, making a distribution. This is entirely different from saying it can only reproduce samples which it was trained on. It is not a memory machine that is surgically piecing together snippets of memorized samples. (That would be a mind bogglingly impressive machine!) A distribution is more than its samples. It is the things between too. Does the LLM perfectly capture the distribution? Of course not. But it's a compression machine so it compresses the distribution. Again, different from compressing the samples, like one does with a zip file. So distributionally, can it produce anything novel? No, of course not. How could it? It's not magic. But sample wise can it produce novel things? Absolutely!! It would be an incredibly unimpressive machine if it couldn't and it's pretty trivial to prove that it can do this. Hallucinations are good indications that this happens but it's impossible to do on anything but small LLMs since you can't prove any given output isn't in the samples it was trained on (they're just trained on too much data).
Up until very recently most LLMs have struggled with the prompt
This is certainly in their training distribution and has been for years, so I wouldn't even conclude that they can solve problems "in their training data". But that's why I said it's not a perfect model of the distribution.
One needs to be quite careful with examples as you'll have to make the unverifiable assumption that such a sample does not exist in the training data. With the size of training data this is effectively unverifiable.But I would also argue that humans can do more than that. Yes, we can combine concepts, but this is a lower level of intelligence that is not unique to humans. A variation of this is applying a skill from one domain into another. You might see how that's pretty critical to most animals survival. But humans, we created things that are entirely outside nature require things outside a highly sophisticated cut and paste operation. Language, music, mathematics, and so much more are beyond that. We could be daft and claim music is simply cut and paste of songs which can all naturally be reproduced but that will never explain away the feelings or emotion that it produces. Or how we formulated the sounds in our heads long before giving them voice. There is rich depth to our experiences if you look. But doing that is odd and easily dismissed as our own familiarity deceives us into our lack of. | ||||||||
| ▲ | XenophileJKO a day ago | parent | next [-] | |||||||
The limit of a LLM "distribution" effectively is actually only at the token level though once the model has consumed enough language. Which is why those out of distribution tokens are so problematic. From that point on the model can infer linguistics even on purely encountered words, concepts. I would even propose in context inferred meaning based on context, just like you would do. It builds conceptual abstractions of MANY levels and all interrelated. So imagine giving it a task like "design a car for a penguin to drive". The LLM can infer what kinda of input does a car need, what anatomy does a penguin have and it can wire it up descriptively. It is an easy task for an LLM. When you think about the other capabilities like introspection, and external state through observation (any external input), there really are not many fundamental limits on what they can do. (Ignore image generation, it is an important distinction on how an image is made, end to end sequence vs. pure diffusion vs. hybrid.) | ||||||||
| ||||||||
| ▲ | astrange 2 days ago | parent | prev [-] | |||||||
> This is entirely different from saying it can only reproduce samples which it was trained on. It is not a memory machine that is surgically piecing together snippets of memorized samples. (That would be a mind bogglingly impressive machine!) You could create one of those using both a Markov chain and an LLM. | ||||||||
| ||||||||