Remix.run Logo
benreesman 5 days ago

In 2025 if you need convenience and no red tape you've got fly.io in the general case and maybe Vercel or something on a particular framework (there are some good ones for a particular stack).

If your needs go beyond that? Then you need real computers with real configuration and you have OVH/Hetzner/Latitude who will rent you MONSTER machines for the cost of some cheap-ass surplus 2017 Intel on The Cloud.

And if you just want a blog or whatever? Zillion VPS options.

The traditional cloud is for regulatory/process/corruption capture extraction in 2025: its machine economics and developer productivity use case is fucking zero I've seen. Maybe there's some edge case where a completely unencumbered team is better off with DMV trip permissions theatre, remnant Intel racked with noisy neighbors at massive markup, and no support recourse.

nine_k 5 days ago | parent [-]

(1) How does fly.io reliability compare to AWS, GCP, or maybe Linode or DO?

(2) What do you do if your large Hetzner server starts to show signs of malfunction? How soon would you be able to replace it, and how easily?

(2a) What do you do when your large Hetzner server just dies? I see that this happens rarely, but what's your contingency plan, if any?

(3) What do you do when your load is highly spiky? Do you reserve bare metal capacity for the biggest peak you expect to serve, because it's so much cheaper than running an elastic serverless architecture of the same capacity anyway?

(4) Considering that your stack still includes many components, how do you manage them, and how expensive is the management overhead? Do you need an extra SRE?

These are not rhetorical questions; I'd love to hear firm real practitioners! (E.g. Stack Overflow used to do deep dives into their few-big-servers architecture.)

runako 5 days ago | parent [-]

These are great questions.

A key factor underlining all of this is understanding, from a business/organizational perspective, your actual uptime requirements. Google may aim at 5 nines with the budget to achieve it, but many banks have routine planned downtime. If you don't know your objectives, you will have trouble making the tradeoffs necessary to get there. As a hypothetical, would your business choose 99.999% uptime (26 seconds down on average per month) vs 99.99% (4.3 min) if that caused infra costs to rise by 50% or more? If you said we can cut our infra costs by 50% by planning a short weekly maintenance window, how would that resonate?

Speaking to a few, in my experience:

2) (not at Hetzner specifically, but at a dedicated host). You have backups & recovery plans, and redundancy where it makes sense. You might run your database with a replica. If you are serving Web traffic, maybe you keep a hot spare. Also, you are still allowed to use e.g. cloud services if it makes sense to do so so you can backup to S3 and use things like SQS or KMS if you don't want to run them yourself. It's worth noting that you may not get advance notice; I recall our service being impacted by a fire at a datacenter that IIRC was caused by a traffic accident on a nearby highway. The point is you have to design resilience into the system. Fortunately, this is well-trod ground.

It would not be a terrible failover option to have something like an autoscale group at AWS ready to step in if the dedicated cluster goes offline. Keep that cluster scaled to 0 until it's needed. Put the cloud behind your cheap dedicated capacity.

3) See above. In my case, we over-provisioned because it's cheap to do so. I did not do this at the time, but I would probably look at running a replicated database with a hot standby on another server.

4) It has not been my experience that "modern" cloud deployments require fewer SRE resources. Like water running downhill, cloud projects seek complexity.