Remix.run Logo
zerd 8 hours ago

I've had issues in a database with billions of rows where the PKs were a UUID. Indices on PK, and also foreign keys from other tables pointing to that table were pretty big, so much so that the indices themselves didn't all fit in memory. Like we would have an index on customer_id, document_id, both UUIDv4. DB didn't have UUID support, so they were stored as strings, so just 1 billion rows took ~30 GiB memory for PK index, 60GiB for the composite indices etc. So eventually the indices would not fit in memory. If we had UUID support or stored as bytes it might have halved it, but eventually become too big.

If you needed to look up say the 100 most recent documents, that would require ~100+ disk seeks at random locations just to look up the index due to the random nature of UUIDv4. If they were sequential or even just semi-sequential that would reduce the number of lookups to just a few, and they would be more likely to be cached since most hits would be to more recent rows. Having it roughly ordered by time would also help with e.g. partitioning. With no partitioning, as the table grows, it'll still have to traverse the B-Tree that has lots of entries from 5 years ago. With partitioning by year or year-month it only has to look at a small subset of that, which could fit easily in memory.