I think managing stateless infrastructure is much easier, if anything goes haywire you can expect a readiness probe to fail, k8s quietly takes down the instance, and life continues with no downtime.

It is also perfectly possible to roll your own highly-available Postgres setup, but that requires a whole another set of precise configuration, attention to details, caring about the hardware, occasionally digging into kernel bugs, and so forth that cloud providers happily handle behind the scene. I'm very comfortable with low-level details, but I have never built my own cloud.

I do test my backups, but having to restore anything from backups means something has gone catastrophically wrong, I have downtime, and I probably have lost data. Everything to prevent that scenario is what's making me sweat a little bit

▲

lossolo a month ago | parent | next [-]

> occasionally digging into kernel bugs

No, it doesn't. I've been self-hosting a multi-node, highly available, and fault-tolerant PostgreSQL setup for years, and I've never had to go to that level. After reading your whole post, I'm not sure where you're getting your information from.

	▲	tux3 a month ago \| parent [-]
		Horror stories stick with me more than success stories, but I'm happy to take the feedback. I'm glad it went well for you, that's a small update for me.

▲

JimBlackwood a month ago | parent | prev [-]

> occasionally digging into kernel bugs

Haha, been there! We recently had outages on kube-proxy due to a missing `—set-xmark` option in iptables-restore on Ubuntu 24.04.

On any stateful server we always try to be several major versions behind due to issues like above - that really avoids most kernel bugs and related issues.