Remix.run Logo
hinkley 2 days ago

I had a problem where we needed to compare large data sets between machines for keys that existed in both, and the bandwidth cost just wasn’t mathing for the median result set size. I was trying to figure out how to send a fingerprint from machine A to B, then have machine B send the hits back. Or how many round trips I could do based on set size to minimize bandwidth + latency. I ended up with a calculus problem nobody could help me solve because of an n^5 term.

My boss was generally pretty good with obscure data structures but neither of us had encountered Bloom filters. This was about three years after Google published their paper on how they were using Bloom filters, but that company would be bankrupt before I figured it out.