Remix.run Logo
namibj a day ago

B(+)Trees do actually admit a fast intersection (they offer a way more powerful projected-to-shared-keyspace mutual index join, technically it's even able to do antijoin but that'll actually modify iteration more than a very genericalized inner join; basically whenever you look at a key in any one of the involved indices you project it to a shared keyspace before doing the comparison-based-search things):

You get cache locality from the upper layers, and for navigation basically `let mut head = keyspace.min(); 'outer: while(!cursors[0].finished()){ for(&mut cursor in cursors.iter_mut()) { let new_head = cursor.seek_to_target_or_next_after_if_none_match(head); if (head != new_head) {continue 'outer; }} /* passed all without seeking past target on any one */ output_fun(head, cursors.iter().map(|x| x.val())); }`. If you want you can do the inner loop's seeks concurrently, which helps if those are IO latency bound and you can afford to waste absolute IOPS on eagerly doing that. You'll want to locally compute the max() of those returned and assign that to `head`. Imagine the cursors are lambda-parametrized to feel like they operate on the projected shared keyspace.

If the keys are a bitstring prefix suited to a binary prefix trie you can actually intersect that way, it's beyond worst case optimal when multiple key columns are involved. Sadly any simple implementation strategies of those algorithms have prohibitive external-memory-machine coefficients for their nominally poly-logarithmic IOPS, due to involvement of combinatorial explosion / curse of dimensionality in search tree/trie structures. They do work though. C.f. "Tetris-LoadBalanced"/"Tetris-Reordered".

The latter even tames one index containing "all" even numbers and the other "all" odd numbers, well, matters more if you involve 3+ columns :D