▲ | jxmorris12 3 days ago | ||||||||||||||||||||||||||||
Matryoshka embeddings are not sparse. And SPLADE can scale to tens or hundreds of thousands of dimensions. | |||||||||||||||||||||||||||||
▲ | faxipay349 a day ago | parent | next [-] | ||||||||||||||||||||||||||||
Yeah, the standard SPLADE model trained from BERT typically already has a vocabulary/vector size of 30,552. If the SPLADE model is based on a multilingual version of BERT, such as mBERT or XLM-R, the vocabulary size could inherently expand to approximately 100,000, as does the vector size. | |||||||||||||||||||||||||||||
▲ | CuriouslyC 3 days ago | parent | prev [-] | ||||||||||||||||||||||||||||
If you consider the actual latent space the full higher dimensional representation, and you take the first principle component, the other vectors are zero. Pretty sparse. No it's not a linked list sparse matrix. Don't be a pedant. | |||||||||||||||||||||||||||||
|