Doing the same, grabbed a reasonably cheap Ryzen (zen2) server with 64GB ECC and 4x NVMe SSDs (2x 512G + 2x 1024G).
Runs pretty much this stack:
"Infrastructure":
- NixOS with ZFS-on-Linux for as 2 mirrors on the NVMes
- k3s (k8s 1.31)
- openebs-zfs provisioner (2 storage classes, one normal and one optimized for postgres)
- cnpg (cloud native postgres) operator for handling databases
- k3s' built-in traefik for ingress
- tailscale operator for remote access to cluster control plane and traefik dashboard
- External DNS controler to automate DNS
- Certmanager to handle LetsEncrypt
- Grafana cloud stack for monitoring. (metrics, logs, tracing)
Deployed stuff:
- Essentially 4 tenants right now
- 2x Keycloak + Postgres (2 diff. tenants)
- 2x headscale instances with postgres (2 diff. tenants, connected to keycloak for SSO)
- 1 Gitea with Postgres and memcached (for 1 tenant)
- 3 postfix instances providing simple email forwarding to sendgrid (3 diff. tenants)
- 2x dashy as homepage behind SSO for end users (2 tenants)
- 1x Zitadel with Postgres (1 tenant, going to migrate keycloaks to it as shared service)
- Youtrack server (1 tenant)
- Nextcloud with postgres and redis (1 tenant)
- tailscale-based proxy to bridge gitea and some machines that have issues getting through broken networks
Plus few random things that are musings on future deployments for now.The server is barely loaded and I can easily clone services around (in fact a lot of the services above? instantiated from jsonnet templates).
Deploying some stuff was more annoying than doing it by hand from shell (specifically nextcloud) but now I have replicable setup, for example if I decide to move from host to host.
Biggest downtime ever was dealing with not well documented systemd-boot behaviour which caused the server to revert to older configuration and not apply newer ones.