Remix.run Logo
mjb 4 hours ago

Good read.

I've always been a little confused about this framing of WSI. The observation that detecting read-write conflicts is sufficient for serializability dates back to at least Kung and Robinson in '83 (IIRC). It is true, though, and the observation that it's a minor change to an already MVCC database's commit logic is theoretically correct.

It's not really practically correct, though. Writes kinda have to be resolved to updated keys, so detecting w-w conflicts is very easy. In a SQL database, though, reads can be predicates, or aggregations, or even indicate a lack of data (gaps). This makes practically implementing this scheme on real world workloads pretty tricky, both correctness-wise and performance-wise. Clearly possible, but quickly devolves into a bunch of optimizations around edge cases. Granted, it is easier in databases that don't need full SQL semantics.

We actually started here early in the design of Aurora DSQL, but changed our minds and picked SI based on data about what isolation levels people actually choose (vs what they say they choose), the difficulty that optimizing schemas and queries for good performance under serializability presents to application programmers (you have to be very very careful to read only what you need), and the general large size of read sets compared to write sets in relational workloads. We might end up doing serializability down the line, but the demand isn't there once people see the real world tradeoffs.

Amusing aside (not about the article linked here). It's super common to see people try refute the performance cost of serializability using TPC-C. That's funny because TPC-C is serializable at SI, and never experiences write skew due to the structure of it's workload.