Remix.run Logo
lmeyerov 4 hours ago

I don't know at these scales, but at the 1M-100M, we found switching from out-of-box embeddings to fine-tuning our embeddings gave less of a sting in the compression/recall trade-off . We had a 10-100X win here wrt comparable recall with better compression.

I'm not sure how that'd work with the binary quantization phase though. For example, we use Matroyska, and some of the bits matter way more than others, so that might be super painful.