Remix.run Logo
karussell 2 days ago

We already use carrotsearch internally so you could replace the java util classes like HashMap and HashList with it to reduce memory usage a bit. But it won't help much. E.g. all data structures (in any standard library btw) do usually double their size at some point when their size increases and then copy from the old internal array to the new internal array, which means that you need roughly 3x the current size and if that happens roughly at the end of the import process you have a problem. For that reason we developed DataAccess (inmemory or MMAP possible) which is basically a large List but 1. increases only segment by segment and 2. allows more than 2 billion items (signed int).

Another trick for planet size data structure could be to use a List instead of the Map and the OSM ID as index. Because the memory overhead of a Map compared to a List is huge (and you could use DataAccess) and the OSM IDs for planet are nearly adjacent or at least have not that many gaps (as those gaps are refilled I think).

All these tricks (there are more!) are rather tricky&low level but necessary for memory efficiency. A simpler way for your use case could be to just use a database for that, like MapDB or sqlite. But this might be (a lot) slower compared to in-memory stuff.