Remix.run Logo
63stack 4 hours ago

You are supposed to define indexes based on how you query the data, not ahead of time.

3 hours ago | parent | next [-]
[deleted]
jiggawatts 3 hours ago | parent | prev [-]

"Supposed to" is doing a lot of heavy lifting here.

As a consultant I come across a lot of CotS software, in-house or otherwise bespoke software, etc. Roughly 40% of the former has 100% of the minimum required number of indexes and approximately 5% of the latter. By "minimum", I mean the indexes required to avoid full scans of tables that will become large enough for this to be a problem in production.

"Disciple doesn't scale." is one of my favourite sayings now, for a reason!

1) Developers almost always work with toy data, and are hence insulated from poor indexing decisions. Problems turn up 'x' years from now. It is well established that humans learn poorly when consequences are delayed... by mere hours, let alone years!

2) DBAs and developers often have an adversarial relationship. A common consequence of this workplace dynamic is that developers aren't granted the required access to tune indexes, especially in production, which is where the issues manifest.

3) I've heard anecdotes, including here, along the lines of "XYZ cloud native / webscale database is so much faster than ABC traditional RDBMS!". Very often the difference is just that XYZ auto-indexes by default. CosmosDB, Google Firestore, Kusto, Elastic, Druid, and many columnar formats are in this category of "magically" faster!

I'm now 99% convinced that RDBMS needs to be reinvented for the modern fast-paced, vibe-coded, "I'm a fullstack(lol) dev" world where people simply don't have the bandwidth to pay attention to minutiae like on-disk sort order and filtered secondary indexes. A better fit for today's world would be a system that is: columnar by default like SAP HANA, compressed[1] by default, indexed by default (thanks to being columnar!), serializable by default, and "include batteries" like native queue capabilities so that nobody has to figure out cross-RDBMS complications like distributed transactions, outbox patterns, or deal with the consequences of a DBA rolling back one of two databases to a backup.