> The index is 214 MB! That's almost half the size of the entire table. So the analysts are happy, but you? Not so much...

This is part of a broader choice: write amplification. You'd want to, of course, have the most precise index possible - but no matter how you cut it, you are incurring extra I/O for writes - one for the tuple, one per index. How you index things is heavily influenced by the mix of reads and writes, and this is why we have data warehouses/read replicas in the first place: it allows us to avoid write amplification in the write path, while having fast filtered reads (that are slightly delayed).

If you're dealing with <ridiculous number of users>, there is a good chance that you don't want to be putting BI/OLAP indices on your OLTP database. You probably don't have enough users to worry about this - but - if you ever find that your writes are becoming an issue this is something to consider.

▲

SigmundA 6 hours ago | parent [-]

Would be nice if PG supported clustered indexes (Index Organized Tables in Oracle speak) as an option if you have a table thats accessed mostly the same way you can get a index without the write amplification because the table is the index.

▲

giovannibonetti 6 hours ago | parent | next [-]

Clustered indexes only save up to 2x write amplification in the very rare case where you're indexing the entire table (e.g. if it has very few columns).

However, that is usually the least of your concerns with write amplification. If you don't batch your writes, you can easily get 100x write amplification. For any primary key or any other index not strongly correlated with your INSERTs, you can get perhaps another 100x write amplification even if you batch you writes.

	▲	Tostino 2 hours ago \| parent \| next [-]
		Clustered indexes aren't just about write amplification. They also reduce the reads needed to get the data. Sometimes by quite a bit.
	▲	SigmundA 6 hours ago \| parent \| prev [-]
		>in the very rare case where you're indexing the entire table (e.g. if it has very few columns). Not sure I follow most tables are accessed primarily in one way (primary key) while maybe sometimes in others for analysis. Having the PK written twice because it's almost always indexed is normally a waste and good candidate for a clustered index. So much so that many DB's like SQLite and MySql always do clustered indexes on primary key because their storage engine is built such that tables are a b-tree anyway vs PG that has separate b-tree indexes and heap tables. MSSQL and Oracle give you a choice whether the table is a index structure or a heap. If you have very specific use case tables they can typically have a clustered index and no secondary indexes, you can still scan them for ad-hoc analysis but you get better insert performance and space usage because you aren't double writing to the heap and a PK index like you would in PG. As far as batch writes that is a separate issue and has to due with whether that even makes sense for durability, if you need to commit a single random row due to something occurring you can't batch that up and maintain consistency, if your bulk loading data sure and is common practice to do commit batches there, clustered indexes could still be a 100 vs 200x write amplification if you have to insert both an index row and heap row vs just a single clustered index row.

▲

ComputerGuru 5 hours ago | parent | prev [-]

Another option would be a good way of placing indexes on a different physical disk. You could use fast, ephemeral storage like you can for a WAL without amplifying the writes to the same device that is your expensive bottleneck. You could rebuild on data loss.

But it would add complexity to detect out-of-sync indexes and tables.

▲

SigmundA 5 hours ago | parent [-]

You mean tablespaces: https://www.postgresql.org/docs/current/manage-ag-tablespace... ?

▲

ComputerGuru 5 hours ago | parent [-]

Maybe? I wasn’t under the impression these could be reliably lost or out of sync without risking data loss?

▲

SigmundA 4 hours ago | parent [-]

Wasn't aware you could put a WAL on a unreliable storage system either without risking data loss?

Would be interesting for indexes say put them on ram drive and rebuild them on restart if they aren't there just fallback to table scans.

MSSQL has memory optimized tables that do this sort of thing: https://learn.microsoft.com/en-us/sql/relational-databases/i...

▲

ComputerGuru 4 hours ago | parent [-]

If you lose the WAL you lose the data since the last merge but there’s no risk of corruption. The WAL handles missed syncs fine, too, missing losing just that window of data.

I don’t know if or how Postgres records the transaction number in the index to be able to notice if it’s out of date. If it does, I don’t know of any solution to “catch up” the index besides recreating it, which would be ok if that’s the only issue but from my experience with out-of-date indexes (libc or icu updates, where Postgres doesn’t know if anything IS broken and just reports that it could be), there’s no guarantee you’d even notice and your app could be running completely broken until you rebuild.

▲

SigmundA 4 hours ago | parent [-]

>If you lose the WAL you lose the data since the last merge but there’s no risk of corruption.

That is not my understanding:

https://www.postgresql.org/docs/current/app-pgresetwal.html

>After running this command on a data directory with corrupted WAL or a corrupted control file, it should be possible to start the server, but bear in mind that the database might contain inconsistent data due to partially-committed transactions. You should immediately dump your data, run initdb, and restore. After restore, check for inconsistencies and repair as needed.

	▲	ComputerGuru 4 hours ago \| parent [-]
		I guess that depends on the definition of corrupted. It just describes what to do when the WAL is corrupt, but doesn’t say what conditions are considered corrupt. (We use ZFS so I haven’t run into torn writes.)