Remix.run Logo
gbacon 5 days ago

Cosine similarity is your friend.

nsingh2 5 days ago | parent [-]

Cosine similarity is the dot product of vectors that have been normalized to lie on the unit sphere. Normalization doesn't alter orthogonality, nor does it change the fact that most high‑dimensional vectors are (nearly) orthogonal.

samrus 5 days ago | parent [-]

Maybe cosine similarity isnt the sulver bullet, but going back to the point: why dont LLM embedding spaces suffer from the curse of dimensionality?

namibj 5 days ago | parent [-]

They do. It's just that for two vectors to be orthogonal it's the case as soon as they're orthogonal when projected down to any subspace; the latter means that if for example one coordinate is all they differ on, and it's inverse in that value between the two vectors, then these two vectors _are already orthogonal._

In d dimensions you can have d vectors that are mutually orthogonal.

Interestingly this means that for sequence lengths up to d, you can have precise positional targeting attention. As soon as you go to longer sequences that's no longer universally possible.