Remix.run Logo
OgsyedIE 2 days ago

You can only get 12,000! concepts if you pair each concept with an ordering of the dimensions, which models do not do. A vector in a model that has [weight_1, weight_2, ... weight_12000] is identical to the vector [weight_2, weight_1, ..., weight_12000] within the larger model.

Instead, a naive mental model of a language model is to have a positive, negative or zero trit in each axis: 3^12,000 concepts, which is a much lower number than 12000!. Then in practice, almost every vector in the model has all but a few dozen identified axes zeroed because of the limitations of training time.

aabhay 2 days ago | parent [-]

You’re right. I gave the wrong number. My model implies 2^12000 concepts, because you choose whether or not to include each concept to form your dimensional subspace.

I’m not even referring to the values within that subspace yet, and so once you pick a concept you still get the N degrees of freedom to create a complex manifold.

The main value of the mental model is to build an intuition for how “sparse” high dimensional vectors are without resorting to a 3D sphere.