Remix.run Logo
kgeist 7 hours ago

Are there vector DBs with 100B vectors in production which work well? There was a paper which showed that there's 12% loss in accuracy at just 1 mln vectors. Maybe some kind of logical sharding is another option, to improve both accuracy and speed.

lmeyerov 4 hours ago | parent | next [-]

I don't know at these scales, but at the 1M-100M, we found switching from out-of-box embeddings to fine-tuning our embeddings gave less of a sting in the compression/recall trade-off . We had a 10-100X win here wrt comparable recall with better compression.

I'm not sure how that'd work with the binary quantization phase though. For example, we use Matroyska, and some of the bits matter way more than others, so that might be super painful.

jasonjmcghee 4 hours ago | parent | prev | next [-]

So many missing details...

Different vector indexes have very different recall and even different parameters for each dramatically impact this.

HNSW can have very good recall even at high vector counts.

There's also the embedding model, whether you're quantizing, if it's pure rag vs hybrid bm25 / static word embeddings vs graph connections, whether you're reranking etc etc

_peregrine_ 6 hours ago | parent | prev [-]

the solution described in the blog post is currently in production at 100B vectors

rahimnathwani 6 hours ago | parent [-]

For what/who?

_peregrine_ 5 hours ago | parent | next [-]

unfortunately i'm not able to share the customer or use case :( but the metrics that you see in the first charts in the post are from a production cluster

esafak 5 hours ago | parent | prev [-]

https://turbopuffer.com/customers/cursor

_peregrine_ 5 hours ago | parent [-]

this is actually not how cursor uses turbopuffer, as they index per codebase and thus need many mid-sizes indexes as opposed to one massive index as this post describes