| ▲ | adamzwasserman 2 days ago | |||||||||||||||||||||||||||||||
The "no sharing between filters" insight clicked for me on a different problem. I needed to filter items by tags. Bloom filter per item seemed clever - quick membership checks. But with thousands of items sharing dozens of tags, each filter re-encodes the same vocabulary. Pure waste. Switched to an inverted index (tag → item list) with bloom filters per chunk of the index. Now the tag vocabulary is shared, and bloom filters just speed up chunk-skipping when the index grows large. TFA's mistake is using bloom filters -instead- of an inverted index rather than on top of one. The amortization patterns stack, they don't compete. | ||||||||||||||||||||||||||||||||
| ▲ | hinkley 2 days ago | parent [-] | |||||||||||||||||||||||||||||||
Why do these “inverted indexes” just look like indexes to me? Too much time with databases perhaps? | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||