Matryoshka embeddings are not sparse. And SPLADE can scale to tens or hundreds of thousands of dimensions.

faxipay349 a day ago | parent | next [-]

Yeah, the standard SPLADE model trained from BERT typically already has a vocabulary/vector size of 30,552. If the SPLADE model is based on a multilingual version of BERT, such as mBERT or XLM-R, the vocabulary size could inherently expand to approximately 100,000, as does the vector size.

▲

CuriouslyC 3 days ago | parent | prev [-]

If you consider the actual latent space the full higher dimensional representation, and you take the first principle component, the other vectors are zero. Pretty sparse. No it's not a linked list sparse matrix. Don't be a pedant.

▲

yorwba 3 days ago | parent | next [-]

When you truncate Matryoshka embeddings, you get the storage benefits of low-dimensional vectors with the limited expressiveness of low-dimensional vectors. Usually, what people look for in sparse vectors is to combine the storage benefits of low-dimensional vectors with the expressiveness of high-dimensional vectors. For that, you need the non-zero dimensions to be different for different vectors.

▲

zwaps 3 days ago | parent | prev [-]

No one means Matryoshka embeddings when they talk about sparse embeddings. This is not pedantic.

	▲	CuriouslyC 3 days ago \| parent \| next [-]
		No one means wolves when they talk about dogs, obviously wolves and dogs are TOTALLY different things.
	▲	cap11235 3 days ago \| parent \| prev [-]
		Why?