Remix.run Logo
k__ 5 days ago

Awesome, thanks!

If I understand this correctly, there are three major problems with LLMs right now.

1. LLMs reduce a very high-dimensional vector space into a very low-dimensional vector space. Since we don't know what the dimensions in the low-dimensional vector space mean, we can only check that the outputs are correct most of the time.

What research is happening to resolve this?

2. LLMs use written texts to facilitate this reduction. So, they don't learn from reality, but from what humans written down about reality.

It seems like Keen Technologies tries to avoid this issue, by using (simple) robots with sensors for training, instead of human text. Which seems a much slower process, but could yield more accurate models in the long run.

3. LLMs holds internal state as a vector that reflects the meaning and context of the "conversation". Which explains, why the quality of responses deteriorates with longer conversations, if one vector is "stamped over" again and again, the meaning of the first "stamps" will get blurred.

Are there alternative ways of holding state or is the only way around this to back up that state vector at every point an revert if things go awry?

agentcoops 5 days ago | parent | next [-]

Apologies if this comes across as too abstract, but I think your comment raises really important questions.

(1) While studying the properties of the mathematical objects produced is important, I don't think we should understand the situation you describe as a problem to be solved. In old supervised machine learning methods, human beings were tasked with defining the rather crude 'features' of relevance in a data/object domain, so each dimension had some intuitive significance (often binary 'is tall', 'is blue' etc). The question now is really about learning the objective geometry of meaning, so the dimensions of the resultant vector don't exactly have to be 'meaningful' in the same way -- and, counter-intuitive as it may seem, this is progress. Now the question is of the necessary dimensionality of the mathematical space in which semantic relations can be preserved -- and meaning /is/ in some fundamental sense the resultant geometry.

(2) This is where the 'Platonic hypothesis' research [1] is so fascinating: empirically we have found that the learned structures from text and image converge. This isn't saying we don't need images and sensor robots, but it appears we get the best results when training across modalities (language and image, for example). This is really fascinating for how we understand language. While any particular text might get things wrong, the language that human beings have developed over however many thousands of years really does seem to do a good job of breaking out the relevant possible 'features' of experience. The convergence of models trained from language and image suggests a certain convergence between what is learnable from sensory experience of the world and the relations that human beings have slowly come to know through the relations between words.

[1] https://phillipi.github.io/prh/ and https://arxiv.org/pdf/2405.07987

k__ 5 days ago | parent | next [-]

1) Fair. I did some experiments with recommendation systems 15 years ago and we basically stopped using dimensions generated by the system, because nobody could make anything of them. The human-made dimensions were much easier to create user archetypes from.

niam 5 days ago | parent | prev [-]

Re: #2

I've never really challenged that text is a suitable stand-in for important bits of reality. I worry instead about meta-limitations of text: can we reliably scale our training corpus without accreting incestuous slop from other models?

Sensory bots would seem to provide a convenient way out of this problem but I'm not read-enough to know.

visarga 5 days ago | parent | prev | next [-]

> LLMs reduce a very high-dimensional vector space into a very low-dimensional vector space.

What do you mean? There is an embedding size that is maintained constant from the first layer to the last. Embedding lookup, N x transformer layers, softmax - all three of them have the same dimension.

Maybe you mean LoRA is "reducing a high-dimensional vector space into a lower-dimensional vector space"

k__ 5 days ago | parent [-]

I mean LLMs reduce the "vector space" that describes reality into a vector space with fewer dimensions (e.g. 300 in the article I was replying to.)

hangonhn 5 days ago | parent | prev [-]

Point 1 is such an interesting and perhaps profound observation about NNs in general (credit to both you and the original author). I had never thought of it that way but it seems to make intuitive sense.