Remix.run Logo
mgaunard 10 hours ago

Why doesn't it use k-d trees or r-trees?

cpa 9 hours ago | parent [-]

The big reason is that H3 is data independant. You put your data in predefined bins and then join on them, whereas kd/r trees depend on the data and building the trees may become prohibitive or very hard (especially in distributed systems).

mgaunard 9 hours ago | parent [-]

Indices are meant to depend on the data yes, not exactly rocket science.

Updating an R-tree is log(n) just like any other index.

vouwfietsman 8 hours ago | parent | next [-]

I think the key is in the distributed nature, h3 is effectively a grid so can easily be distributed over nodes. A recursive system is much harder to handle that way. R-trees are great if you are OK with indexing all data on one node, which I think for a global system is a no-go.

This is all speculation, but intuitively your criticism makes sense.

Also, mapping 147k cities to countries should not take 16 workers and 1TB of memory, I think the example in the article is not a realistic workload.

cpa 7 hours ago | parent | prev [-]

To add to sibling comment, if you have streaming data you have to update the whole index every time with r/kd trees whereas with H3 you just compute the bin, O(1) instead of O(log n).

Not rocket science but different tradeoffs, that’s what engineering is all about.