Remix clone Hacker News

new | show | ask | jobs Github

	▲	adamzwasserman 2 days ago
		Exactly. A 1.2% false positive rate means unnecessary reads 1.2% of the time vs 100% without the filter. Even at 10% FP rate, you skip 90% of I/O. This asymmetry works great for I/O-bound workloads (skip-indexes) but fails for TFA's approach where every document needs its own filter. In practice, you combine both: inverted index for the dictionary (amortizes across documents), then bloom filters per chunk of the index (amortizes across chunks). This two-level approach handles scale much better than TFA's one-filter-per-document design. It's bloom filters as an optimization layer, not a replacement.