Remix.run Logo
stickfigure a day ago

It never occurred to me that Postgres is more efficient when inserting monotonic values. It's the nature of B+ trees so it makes sense. But in the world of distributed databases, monotonic inserts create hot partitions and scalability problems, so evenly-distributed ids are preferred.

In other words, "don't try this with CRDB".

chuckadams a day ago | parent | next [-]

It's the nature of B+ trees, multiplied by the nature of clustered indexes: if you use a UUIDv4 as a primary key, your entire row gets moved to random locations, which really sucks when you normally retrieve them sequentially. With a non-clustered index (say, your UUIDv4 id you use for public APIs when you don't want to leak the v7 info) then you'll still get more fragmentation with the random data, but it's something autovacuum can usually keep up with. But it's more work it has to do on top of everything else it does.

masklinn a day ago | parent [-]

Gp mentioned Postgres, which does not have clustered indexes. It has table clustering, which is a point operation rewriting the entire table but not a persistent property.

chuckadams 17 hours ago | parent [-]

Ah, I forgot CLUSTER was something run by hand on PG. Same footgun then, but you have to load and aim it yourself instead of being fully automatic like it is in MySQL, where it appears you can't opt out of clustering by the PK (similar story in SQL Server, but you can change which index it clusters by). Thanks for the clarification.

evanelias 16 hours ago | parent [-]

This is a bit pedantic, but you’re conflating MySQL with InnoDB. (In MySQL’s model, you can opt out of having a clustered index by choosing a storage engine that doesn’t use clustering.)

In practical terms, choosing a non-InnoDB storage engine is pretty rare with upstream MySQL, but perhaps slightly less so in Percona Server or MariaDB.

baq a day ago | parent | prev | next [-]

Leaky abstractions in databases are one of the reasons every developer should read the table of contents of the hot databases used by the things he’s working on. IME almost no one does that.

therealdrag0 12 hours ago | parent | prev [-]

Can you elaborate on the hot partition bit?