▲ | ttul 5 days ago | |||||||||||||||||||||||||
The weird thing about high-dimensional spaces is that most values are orthogonal to each other and most are also very far apart. It’s remarkable that you can still cluster concepts using dimension-reduction techniques when there are 50,000 dimensions to play with. | ||||||||||||||||||||||||||
▲ | roadside_picnic 5 days ago | parent | next [-] | |||||||||||||||||||||||||
It would be weird if the points in the embedding space where uniformly distributed but they're not. The entire role of the model in general is to project those results on to a subset of the larger space, that "makes sense" for the problem. Ultimately the projection becomes one in which the latent categories we're trying to predict (class, token, etc) become linearly separable. | ||||||||||||||||||||||||||
▲ | gbacon 5 days ago | parent | prev [-] | |||||||||||||||||||||||||
Cosine similarity is your friend. | ||||||||||||||||||||||||||
|