▲ | lyu07282 3 days ago | |||||||
Show your math lol | ||||||||
▲ | Voloskaya 3 days ago | parent [-] | |||||||
I assume by "node" OP meant something like a DGX node. Which yea, that would work, but not everyone (no one?) wants to buy a 500k system to do vector search. B200 spec: * 8TB/sec HBM bandwidth * 10 PetaOPs assuming int8. * 186GB of VRAM. If we work with 512-dimensional int8 embeddings, then we need 512GB VRAM to hold them, so assuming we have 8xB200 node (~500k$++), we can easily hold them (125M vectors per GPU). It takes about 1000 OPs to do the dot product between two vectors, so we need to do 1000*1B = 1TeraOPs, spread over 8 GPUs, that's 125 GigaOPs per GPU, so a fraction of a ms. Now the bottleneck will be data movement between HBM -> chips, since we have 125M vectors per GPU, aka 64GB, we can move them in ~8 ms. Here you go, the most expensive vector search in history, giving you the same performance as a regular CPU-based vectorDB for only 1000x the price. | ||||||||
|