Remix.run Logo
daneel_w 5 hours ago

"Overall, we find a Postgres server can handle up to 144K of these writes per second. That’s a lot, equivalent to 12 billion writes per day."

Based on a problem I'm facing with Postgres today, I wonder if this really progresses as linearly as the article wants to make it out.

We're in the middle of evaluating Postgres as a replacement for MySQL, and experience notable slow-down for plain multi-row inserts due to index growth as soon as the table reaches just a couple of dozen million rows. It's an uncomplicated and flat (no constraints or foreign keys etc.) medium width table of about 10-15 columns and a handful of non-composite btree indices - and/or hash indices; we've tried mixing and matching just to see what happens - but ingestion drops to less than half already before 50m rows. At 100m rows the insertion performance is down to a fraction and from there it just gets worse the larger the table and its indices grow. It's as if there's some specific exponential cut-off point where everything goes awry. However, if we simply remove all indices from the table, Postgres will happily insert hundreds of millions rows at a steady and near identical pace from start to end. The exact same table and indices on MySQL, as closely as we can match between MySQL and Postgres, running on the same OS and hardware, maintains more or less linear insertion performance well beyond 500m rows.

Now, there's a lot to say about the whys and why-nots when it comes to keeping tables of this size in an RDBMS and application design relying on it to work out, and probably a fair amount more about tuning Postgres' config, but we're stumped as to why PG's indexing performance falters this early when contrasted against InnoDB/MySQL. 50-100m rows really isn't much. Would greatly appreciate if anyone with insight could shed some light on it and maybe offer a few ideas to test out.

(add.: during these stress tests the hardware is nowhere close to over-encumbered, and there's consistent headroom on both memory, CPU and disk I/O)

bijowo1676 2 hours ago | parent | next [-]

problem is table design and write amplification. Every row insert triggers update into every index, so you get classic amplification problem.

Separate your table into Cold (with all indexes and bells and whistles) and Hot (heap table with no indexes except PK).

Insert as many rows as you want into Hot heap, and then move them in the background into cold in batches, so that index recalculation is amortized across many rows, instead of per-row.

Another poster suggested partitioning, thats the same idea: separate Hot and Cold data into partitions and keep hot partition as heap

giovannibonetti 4 hours ago | parent | prev | next [-]

With some extra admin work, you can greatly increase your insert throughput, as long as the table load is comprised mostly of inserts: 1. Partition your table by range of a monotonic ID or timestamp. Notice the primary key will have to contain this column. A BIGINT id column should work fine; 2. Remove all the other indexes from the partitioned table. Add them to all the partitions, except the latest one. This way, the latest one can endure a tough write load, while the other ones work fine for reads; 3. Create an admin routine (perhaps with pg_cron) to create a new partition whenever the newest one is getting close to the limit. When the load moves to the newer partition, add indexes concurrently to the old one; 4. You'll notice the newest partition will the optimized for writes but not reads. You can offset some of that by replacing BTREE secondary indexes with BRIN [1], particularly the one with bloom operator (not to be confused with Postgres Bloom regular indexes [2]). BRIN is a family of indexes more optimized for writes than reads. If the partition is not too large, it shouldn't be too bad to read from it. 5. Later you can merge partitions to avoid having too many of them. Postgres has commands for that, but I think they lock the whole table, so a safer bet is to copy small partitions into a new larger one and swap them manually.

[1] https://www.postgresql.org/docs/current/brin.html [2] https://www.postgresql.org/docs/current/bloom.html

subhobroto 4 hours ago | parent [-]

These are good suggestions but I'm apprehensive they might come back and say they have 64 GB (or less) of RAM or they are using PostgreSQL RDS on AWS or something.

I asked them for specifics.

keithnz 2 hours ago | parent [-]

I don't think it really matters in terms of their question though, given MySql on the same specs doesn't have the problem and postgres does. Quite clearly it has something to do with indexes and what is the wall postgres is running into that causes the drop off on quite low amounts of rows. If the answer is just get more RAM, it kind of implies postgres is not really that scalable. Especially if the drop off is proportional to the number of rows.

andersmurphy 3 hours ago | parent | prev | next [-]

The problem is row locks when using interactive transactions over the network and contention. That can absolutely kill your performance with postgres, there's not really anything you can do to get around it (other than avoid interactive transactions). [1]

[1] - https://andersmurphy.com/2025/12/02/100000-tps-over-a-billio...

rconti 2 hours ago | parent [-]

We had an interesting architecture situation at work. Puppet Enterprise uses a single Postgres server. The company had moved from a recommendation of using a single PuppetDB API node (which fell over at high load) to running a PuppetDB API server on each compiler node.

That, however, came with its own set of problems. Of course you have to tune for concurrent connections as you scale wider, but there were much more serious contention issues than you'd expect, and the compilation times were terrible too. It turned out to be because those transactions locked the DB during their (synchronous) operations, and we had a globally distributed set of compilers in order to serve globally distributed traffic.

The solution ended up being to run a separate cluster of API servers in the same region as the DB. The expensive calls from the compilers to the API servers were largely async https so they didn't have to wait on the API nodes, and the API nodes could talk to the DB synchronously with low latency.

subhobroto 4 hours ago | parent | prev | next [-]

You've given us some idea of the volume of your data but there's no mention of what's ingesting it or how.

> during these stress tests the hardware is nowhere close to over-encumbered, and there's consistent headroom on both memory, CPU and disk I/O

This assertion is likely wrong - you're likely skipping over some metrics that has clues to what we need to know. Here are some questions to get the discussion moving.

- Is this PostgreSQL managed or self-hosted?

Your mention of "consistent headroom on both memory, CPU and disk I/O" gives me hope you're self-hosting it but I've heard the same thing in the past from people attempting to use RDS and wondering the same as you are, so no assumptions.

- Are you using COPY or multi-row INSERT statements?

- How much RAM does that server have?

- What is the fillfactor, max_wal_size and checkpoint_timeout?

- Is the WAL on NVMe?

- What's the iostat or wa during the slowdown?

- Are random UUIDs (part of) the index?

Have you posted to https://dba.stackexchange.com/

If I were you, I would create a GitHub repo that has scripts that synthesize the data and reproduce the issues you're seeing.

justinclift 3 hours ago | parent | prev [-]

What's the underlying filesystem(s) you're using for the data storage?