Remix.run Logo
rvitorper a day ago

Does anyone have performance issues with uuidv4? I worked with a db with 10s of billions of rows, no issues whatsoever. Would love to hear the mileage of fellow engineers

zerd 6 hours ago | parent | next [-]

I've had issues in a database with billions of rows where the PKs were a UUID. Indices on PK, and also foreign keys from other tables pointing to that table were pretty big, so much so that the indices themselves didn't all fit in memory. Like we would have an index on customer_id, document_id, both UUIDv4. DB didn't have UUID support, so they were stored as strings, so just 1 billion rows took ~30 GiB memory for PK index, 60GiB for the composite indices etc. So eventually the indices would not fit in memory. If we had UUID support or stored as bytes it might have halved it, but eventually become too big.

If you needed to look up say the 100 most recent documents, that would require ~100+ disk seeks at random locations just to look up the index due to the random nature of UUIDv4. If they were sequential or even just semi-sequential that would reduce the number of lookups to just a few, and they would be more likely to be cached since most hits would be to more recent rows. Having it roughly ordered by time would also help with e.g. partitioning. With no partitioning, as the table grows, it'll still have to traverse the B-Tree that has lots of entries from 5 years ago. With partitioning by year or year-month it only has to look at a small subset of that, which could fit easily in memory.

cipehr a day ago | parent | prev | next [-]

What database were you using? For example with SQL server, by default it clusters data on disk by primary key. Random (non-sequential) PKs like uuidv4 require random cluster shuffling to insert a row “in the middle” of a cluster, increasing io load and causing performance issues.

Postgres on the other hand doesn’t do clustered indexing on the PK… if I recall correctly.

rvitorper a day ago | parent | next [-]

Postgres. It was also a single instance, which made it significantly easier. But nice to know that this is an issue on SQL Server

masklinn a day ago | parent [-]

Postgres is not immune to uuid issues, just less sensitive, uuidv4 still does not play well with btree indexes, bloating them and impacting their performance.

ahoka a day ago | parent [-]

Do you also require your users to register in the alphabetical order of their names?

rkomorn a day ago | parent | next [-]

New viral marketing idea just dropped: registration open to b-names only! Get in now before you lose your chance forever to the c-names!

20 hours ago | parent | prev [-]
[deleted]
ahoka a day ago | parent | prev [-]

Then cluster it differently? The whole problem uuidv7 in databases solves is a non-issue in most cases.

crazygringo a day ago | parent | prev [-]

Honestly not really. Yes random keys make inserts slower. But if inserts are only 1% of your database load, then yeah it's basically no issues whatsoever.

On the other hand, if you're basically logging to your database so inserts are like 99% of the load, then it's something to consider.

rvitorper a day ago | parent [-]

Makes sense. Thanks for the comment