Remix.run Logo
fnordpiglet 20 days ago

I prefer to think of it as they’re interpolation machines not extrapolation machines. They can project within the space they’re trained in, and what they produce may not be in their training corpus, but it must be implied by it. I don’t know if this is sufficient to make them too weak to create original “ideas” of this sort, but I think it is sufficient to make them incapable of original thought vs a very complex to evaluate expected thought.

drdeca 20 days ago | parent [-]

People keep saying this, but if you try to interpret this at all literally, it just doesn’t work. Like, it’s phrased like it should have a precise meaning, right? Like, people even mention convex hulls when talking about it.

But if you actually try to take a convex hull of, some encoding of sentences as vectors? It isn’t true. The outputs are not in the convex hull of the training data.

I guess it’s supposed to be a metaphor and not literal, but in that case it’s confusing. Especially seeing as there are contexts in machine learning where literal interpolation vs literal extrapolation, is relevant. So, please, find a better way to say it than saying that “it can only interpolate”?

Muromec 19 days ago | parent [-]

If it's all just points in the multidimensional space, why would the thing be restricted to some operations and not others. I'm not buying the argument

drdeca 19 days ago | parent [-]

Sorry, I don't understand what you mean. Are you agreeing or disagreeing with me?

If it can only interpolate in a literal sense, that means that it only produces good outputs on convex combinations of inputs that appear in the training set. That's what interpolation means. But, if you take the embedding vectors of sentences/prompts, and then take the convex hull of these, it is not typical for new sentences not in the training set to have its embedding vectors be in the convex hull of these.

fnordpiglet 18 days ago | parent [-]

I’m not sure I follow your end to end reasoning. In an n dimensional space interpolation along and within the convex hull is pretty much what they’re doing. How can it possibly not be? How would it interpolate a point that’s not within its vector space? Yes, it’s very complex with non linear transformations and a very high dimensionality, and residuals and other features create more complexity in the shape of the hull. But an LLM can not infer a concept to which it has no information channel. That’s clearly nonsense. The fact that they do bounded, learned, nonlinear compositional generalizations over a representational space induced by training -is by nature interpolation- not extrapolation. I’m sorry, but I believe their immense power has you confusing math with magic.

drdeca 5 days ago | parent [-]

A convex hull is a different thing than the linear span. It is smaller.

And, my point is that the inputs it is often fed are not in the convex hull of the inputs in the training data.

When the input space is very high dimensional, this is a common outcome.

I’m not denying that the outputs are causally downstream from the training data. Of course it is.

I’m saying that the inference time inputs aren’t in the convex hull of the training time inputs. This isn’t about saying that the output isn’t because of the training data. Of course it is.

But when you have very high dimensional input space, then even with many inputs in the training data, it is still common for inference time inputs to not be in the convex hull of the train time inputs.

This has nothing to do with the complexities of how the models work after the initial embedding of the tokens as vectors. It’s just about the inputs that appear during training, and the inputs that appear at inference time.

> But an LLM can not infer a concept to which it has no information channel.

Of course! And nothing I said implies otherwise. Really, the point I’m making doesn’t even depend on what the model outputs!

If I took a best fit line from 1 parameter to a 1D output, and then provided that linear model an output that was outside the range of inputs the best fit line was obtained from, that would not be interpolation, it would be extrapolation.

It is similar here, except instead of the input being outside the convex hull due to being further away, it is outside the convex hull due to, like, the shape of the convex hull of training inputs just doesn’t include the point in question.