Remix.run Logo
supriyo-biswas 2 days ago

Is an incremental backup of the database not possible? Pgbackrest etc. can do this by creating a full backup followed by incremental backups from the WAL.

For Postgres specifically you may also want to look at using hot_standby_feedback, as described in this recent HN article: https://news.ycombinator.com/item?id=44633933

tetha 2 days ago | parent [-]

On the big product clusters, we have incremental pgbackrest backups running for 20 minutes. Full backups take something between 12 - 16 hours. All of this from a sync standby managed by patroni. Archiving all of that takes 8 - 12 hours. It's a couple of terabytes on noncompressible data that needs to move. It's fine though, because this is an append-log-style dataset and we can take our time backing this up.

We also have decently sized clusters with very active data on them, and rather spicy recovery targets. On some of them, a full backup from the sync standby takes 4 hours, we need to pull an incremental backup at most 2 hours afterwards, but the long-term archiving process needs 2-3 hours to move the full backup to the archive. This is the first point in which filesystem snapshots, admittedly, of the pgbackrest repo, become necessary to adhere to SLOs as well as system function.

We do all of the high-complexity, high-throughput things recommended by postgres, and it's barely enough on the big systems. These things are getting to the point of needing a lot more storage and network bandwidth.