Remix.run Logo
lukas221 6 hours ago

I would argue that not using serialization isolation level by default is like not using a memory safe programming language by default.

Sure, sometimes it's too slow, but it should be the default.

Very few people can write correct database code at the other serialization levels. Most think they can, but it's harder than correct multi-threading, because databases do weird unintuitive things for performance.

zadikian 3 hours ago | parent | next [-]

In my experience, the performance hit is so bad that it's not feasible to use that way. It's also not strictly safer behavior-wise because retries can trip people up.

jiggawatts 5 hours ago | parent | prev | next [-]

I recently encountered a query that deadlocked on itself because it used a parallel execution plan and updated multiple indexes in a manner that the different threads could conflict with each other.

Naively one would expect that no individual UPDATE or INSERT statement could deadlock in isolation… but there you go.

If that is possible, the possibilities across multiple concurrent data mutating queries are beyond human comprehension!

Serializable should absolutely be the default!

Similarly, all columns should be automatically indexed to at least some degree, like Postgres BRIN indexes at a minimum.

Time and experience have shown that the vast majority of developers are pathologically unable to properly define all required indexes ahead of time.

hilariously 3 hours ago | parent | next [-]

I've definitely experienced the opposite, where indexes run rampant and are NN times the size of data (for the entire database!)

That being said also "required indexes ahead of time" is impossible - because query patterns change and things get released unless you mean something like "the required indexes for our obvious query patterns we just freaking released.

I dont think most columns need to be indexed because that is mostly nonsense, most columns never have a where clause applied to them or are sorted on themselves, so an index provides ??? value.

63stack 4 hours ago | parent | prev [-]

You are supposed to define indexes based on how you query the data, not ahead of time.

3 hours ago | parent | next [-]
[deleted]
jiggawatts 3 hours ago | parent | prev [-]

"Supposed to" is doing a lot of heavy lifting here.

As a consultant I come across a lot of CotS software, in-house or otherwise bespoke software, etc. Roughly 40% of the former has 100% of the minimum required number of indexes and approximately 5% of the latter. By "minimum", I mean the indexes required to avoid full scans of tables that will become large enough for this to be a problem in production.

"Disciple doesn't scale." is one of my favourite sayings now, for a reason!

1) Developers almost always work with toy data, and are hence insulated from poor indexing decisions. Problems turn up 'x' years from now. It is well established that humans learn poorly when consequences are delayed... by mere hours, let alone years!

2) DBAs and developers often have an adversarial relationship. A common consequence of this workplace dynamic is that developers aren't granted the required access to tune indexes, especially in production, which is where the issues manifest.

3) I've heard anecdotes, including here, along the lines of "XYZ cloud native / webscale database is so much faster than ABC traditional RDBMS!". Very often the difference is just that XYZ auto-indexes by default. CosmosDB, Google Firestore, Kusto, Elastic, Druid, and many columnar formats are in this category of "magically" faster!

I'm now 99% convinced that RDBMS needs to be reinvented for the modern fast-paced, vibe-coded, "I'm a fullstack(lol) dev" world where people simply don't have the bandwidth to pay attention to minutiae like on-disk sort order and filtered secondary indexes. A better fit for today's world would be a system that is: columnar by default like SAP HANA, compressed[1] by default, indexed by default (thanks to being columnar!), serializable by default, and "include batteries" like native queue capabilities so that nobody has to figure out cross-RDBMS complications like distributed transactions, outbox patterns, or deal with the consequences of a DBA rolling back one of two databases to a backup.

lmm 4 hours ago | parent | prev [-]

If you're going to use serialisable isolation level, why bother using a traditional RDBMS at all? At that point you're better off using a simpler datastore.

vlovich123 43 minutes ago | parent | next [-]

Do you mean to say “if you’re NOT going to use serialisable”? I think you missed the NOT and every reply seemed to think you were arguing a different point but your description about not using foreign keys and using Redis instead only makes sense if there’s a NOT there.

cwillu 4 hours ago | parent | prev [-]

Huh?

lmm 4 hours ago | parent [-]

I mean you're not really making use of MVCC etc. at that point. Foreign keys are far less relevant because your transactions are all fully atomic, so it doesn't really matter if your data is in an inconsistent state in the middle of a transaction, and conversely you've got no risk of e.g. adding a reference to a row that another transaction deleted concurrently. Why not just use e.g. Redis at that point?

trumpdong an hour ago | parent | next [-]

Serializable doesn't mean serialized. It means if two transactions access the same data, one must be delayed or aborted. It doesn't mean they all wait for each other.

mamcx 3 hours ago | parent | prev | next [-]

This is a misunderstanding of what serializable in an ACID datastore does, neither that trust developers without FK is always trouble, and that the suggestion of Redis show how much is lost here.

Big point: Serializable not exist alone in a decent ACID datastore, and no, less strict rules for the MOST important thing you have(your data) is NOT a good idea.

Over and over again Acid RDBMS have proven that trying to "relax" the rules in pursuit of performance or worse, mystical holy grails that have never been right or correct for a primary datastore, is a mistake. And then people goes back to them, because is the best tool for ALL the primary data store jobs. ALL OF THEM.

Is like the mythical C developer that "not need safety", that at least has more chance of be possible (after MANY passes over the code) that a datastore without safeguards.

lmm an hour ago | parent [-]

> Over and over again Acid RDBMS have proven that trying to "relax" the rules in pursuit of performance or worse, mystical holy grails that have never been right or correct for a primary datastore, is a mistake. And then people goes back to them, because is the best tool for ALL the primary data store jobs. ALL OF THEM.

On the contrary. The most successful RDBMS by far was MySQL in an era where it didn't have any kind of ACID (you could write the transaction keywords but they didn't do anything). As the story we're talking about now shows, RDBMSes are routinely deployed with transaction settings that their users don't understand, much less use; there are settings that would reduce bugs if anyone cared to use them, but no-one does. People cargo-cult the idea that they should be using an Acid RDBMS but they almost never actually want or need one.

oscillot an hour ago | parent [-]

The success there had more to do with it being a) free, at a time when there were far fewer of those options combined with that it b) ran on windows. The enterprise's pathological windows use for developer machines and the state of apples devices at the time and the near non-existence of linux enterprise laptops which continues today meant that they had the perfect product for that moment in history. I cannot overstate this: the people making these decisions think an acid transaction is a felony and not a term of art used by some of their employees.

vlovich123 41 minutes ago | parent | prev | next [-]

MVCC is not inherently non serializable. For example, Postgres adds serializability on top through SSI.

lukas221 3 hours ago | parent | prev [-]

people mostly use RDBMS because they want the advanced querying that SQL provides, not because of the isolation levels

but ignoring that, serializable isolation level means the database acts AS IF the transactions are serial. but most databases in fact will execute them concurently, with careful tracking to make sure they appear serial

Groxx 3 hours ago | parent [-]

TBH I think I've seen more database use than not specifically because it serves as the central race-resolver in a system, because doing that anywhere else is many, many times harder and more mistake-prone. Fancy querying has been much less needed (and is sometimes a significant code smell), and is often fully offloaded into a data warehouse style system to reduce the risk.

Sometimes though, yes definitely. It's hard to claim anything universal at all about databases.

(Unless you mean "being able to choose between different isolation levels", then yes, completely agreed. Very very few use anything but the default, somewhere below serializable, and it always concerns me unless they can describe exactly what they're intentionally allowing and why it's okay for their system. Most cannot.)