The weird thing about high-dimensional spaces is that most values are orthogonal to each other and most are also very far apart. It’s remarkable that you can still cluster concepts using dimension-reduction techniques when there are 50,000 dimensions to play with.

▲

roadside_picnic 5 days ago | parent | next [-]

It would be weird if the points in the embedding space where uniformly distributed but they're not. The entire role of the model in general is to project those results on to a subset of the larger space, that "makes sense" for the problem. Ultimately the projection becomes one in which the latent categories we're trying to predict (class, token, etc) become linearly separable.

▲

gbacon 5 days ago | parent | prev [-]

Cosine similarity is your friend.

▲

nsingh2 5 days ago | parent [-]

Cosine similarity is the dot product of vectors that have been normalized to lie on the unit sphere. Normalization doesn't alter orthogonality, nor does it change the fact that most high‑dimensional vectors are (nearly) orthogonal.

▲

samrus 5 days ago | parent [-]

Maybe cosine similarity isnt the sulver bullet, but going back to the point: why dont LLM embedding spaces suffer from the curse of dimensionality?

	▲	namibj 5 days ago \| parent [-]
		They do. It's just that for two vectors to be orthogonal it's the case as soon as they're orthogonal when projected down to any subspace; the latter means that if for example one coordinate is all they differ on, and it's inverse in that value between the two vectors, then these two vectors _are already orthogonal._ In d dimensions you can have d vectors that are mutually orthogonal. Interestingly this means that for sequence lengths up to d, you can have precise positional targeting attention. As soon as you go to longer sequences that's no longer universally possible.