Wholly agree. Too often we think about scale far too early.

I've seen very simple services get bogged down in needing to be "scalable" so they're built so they can be spun up or torn down easily. Then a load balancer is needed. Then an orchestration layer is needed so let's add Kubernetes. Then a shared state cache is needed so let's deploy Redis. Then we need some sort of networking layer so let's add a VPC. That's hard to configure though so let's infra-as-code it with terraform. Then wow that's a lot of infrastructure so let's hire an SRE team.

Now nobody is incentivized to remove said infrastructure because now jobs rely on it existing so it's ossified in the organization.

And that's how you end up with a simple web server that suddenly exploded into costing millions a year.

▲

AlotOfReading 6 months ago | parent | next [-]

In a former job, I wrote a static PWA to do initial provisioning for robots. A tech would load the page, generate a QR code, and put it in front of the camera to program the robot.

When I looked into having this static page hosted on internal infra, it would have also needed minimum two dedicated oncalls, terraform, LB, containerization, security reviews, SLAs, etc.

I gave up after the second planning meeting and put it on my $5 VPS with a letsencrypt cert. That static page is still running today, having outlived not only the production line, but also the entire company.

▲

Aurornis 6 months ago | parent | next [-]

> When I looked into having this static page hosted on internal infra, it would have also needed minimum two dedicated oncalls, terraform, LB, containerization, security reviews, SLAs, etc.

In my experience there are two kinds of infrastructure or platform teams:

1) The friendly team trying to help everyone get things done with reasonable tradeoffs appropriate for the situation

2) The team who thinks their job is to make it as hard as possible for anyone to launch anything unless it satisfies their 50-item checklist of requirements and survives months of planning meetings where they try to flex their knowledge on your team by picking the project apart.

In my career it’s been either one or the other. I know it’s a spectrum and there must be a lot of room in the middle, yet it’s always been one extreme or the other for me.

	▲	stackskipton 6 months ago \| parent \| next [-]
		Having work as Ops person at second place, most of time it ends up like that because Ops become dumping ground so they throw up walls in vain attempt to stem the tide. Application throwing error logs? Page out Ops. We made a mistake in our deploy. Oh well, page out Ops so they can help us recover. Security has found a vulnerability in a Java library? Sounds like Ops problem to us.
	▲	nucleardog 6 months ago \| parent \| prev [-]
		I've observed the same and, anecdotally, how much of a pain the ops team is is a function of how much responsibility shifts from the dev team to the ops team during the project lifecycle. Basically, is the ops team there to support the developer/development team, or the product? In the case of the development team, the ops team will tend to be willing to provide advice and suggestions but be flexible on tooling and implementation. In the case of the product, the ops team will tend to be a lot more rigid and inflexible. This plays out in things like: When the PWA becomes critical for the production line and then "is not working" at 3AM, who is getting paged? If it's the developer, then ops is "supporting the developer". If it's the ops team getting called to debug and fix some project they've never laid eyes on before at 3AM, then it's the product. They are, naturally, going to start caring a lot more about how it is set up, deployed, and supported because nobody likes getting woken for work at 3AM. When some project's dependencies start running past EOL, who is going to update it? If it's the developer, then ops is "supporting the developer". If the ops team isn't empowered to give a deadline and have _someone else_ responsible for keeping the project functioning, then they're supporting the product and by letting it be deployed effectively committed to maintaining it in perpetuity and they're going to start caring a lot more about what sort of languages, frameworks, etc are used and specifically how projects are set up because context switching to one of dozens of different projects at 3AM is hard enough as-is without having to also be trying to learn some new framework du jour. (And before anyone says "well the updates probably aren't necessary this is just ops being a pain"--think of the case of a project relying on GCP product that's being shutdown or some kubernetes resource that's been changed. In one case inaction will cause the project to fail, in the other ops' action will cause it to fail. See the first point as to who is going to get called about that. Even in the happy case, consistency brings automation and allows the team to support a _class_ of deployments instead of individual products.) I don't think places exist stably in the middle ground because it's a painful place to be for very long. The responsibility and the control land on separate people, and the person with the responsibility but without the control is generally going to work to wrestle control to reduce misery. In the case where ops acts as if they're supporting the developers but is in practice supporting the product, it's not going to take too many 3AM calls before they start pushing back on how the product's deployed and supported. I've been both of those ops guys you describe. When I was the "checklists, meetings, and picking the project apart" guy it had nothing to do with me wanting to make anyone's life difficult or flexing my knowledge. It had to do with the 3AM calls waking myself, my wife, and my newborn up. If I was taking on responsibility for keeping your _product_ functional through its useful life, yeah, I wasn't going to let people dump stuff on my plate unless I had some reasonable basis to believe it wasn't going to substantially increase my workload and result in more middle of the night calls. The checklists were my way of trying to provide consistency and visibility into the process of reducing my own pain, not my way of trying to create pain for others.

▲

beng-nl 6 months ago | parent | prev [-]

I am reminded of “I forgot how to count that low”

https://news.ycombinator.com/item?id=28988281

▲

roncesvalles 6 months ago | parent | prev | next [-]

Also, I think people vastly overestimate how much uptime their application really needs and vastly underestimate how reliable a single VPS can be.

I currently have VPSes running on both lowend and big cloud providers that have been running for years with no downtime except when it restarts for updates.

▲

kqr 6 months ago | parent [-]

> no downtime except when it restarts for updates.

This sounds a little like saying "all of North America except the U.S."

I don't think people are worried about random breakdowns on a single VPS, but scheduled updates are still downtime, and downtime causes revenue loss regardless of why it happened.

Any time a service is important enough I ask for two servers and a load balancer specifically to handle deployments and upgrade windows transparently. But! I agree services are usually less important than people think.

▲

RadiozRadioz 6 months ago | parent [-]

> upgrade windows

Ok, that explains this and the above comment. The last time I had to restart anything to apply an OS update was when I moved to a new RHEL LTS version, the lifespan of which is about 10 years. And there are many ways to do similar GNU/Linux upgrades without a restart at all.

Does Windows Server really need to restart for updates like normal Windows? If so, that's hilariously crap and I'm glad I've never had to touch it.

Edit: not saying a single VPS is fine if it's GNU/Linux, just remaking on the "restart to update" thing they mentioned

	▲	HappMacDonald 6 months ago \| parent \| next [-]
		> to handle deployments and upgrade windows transparently GP might have meant "upgrade: Windows(tm)", or he might have meant "windows of time which we have allocated to upgrading the server", and on my first reading I interpreted the second without a single shred of thought towards the possibility of the first.
	▲	stackskipton 6 months ago \| parent \| prev [-]
		Most people have to apply patches and if they don't install kpatch, there is generally a restart required to make sure everything is using new versions. Yes, you can restart all the services that probably slightly less downtime to full reboot on most VPS these days.

▲

ajayvk 6 months ago | parent | prev | next [-]

Having a single process web/app server simplifies things operationally. I am building https://github.com/claceio/clace, which is an application server for teams to deploy internal tools. It runs as a single process, which implements the webserver (TLS certs management, request routing, OAuth etc) as well as an app server for deploying apps developed in any language (managing container lifecycle, app upgrades through GitOps etc).

▲

benoau 6 months ago | parent | prev | next [-]

Shed a tear for Heroku, they made all this go away such a long time ago but ultimately squandered their innovation and the ~decade lead they had on other thinking in this fashion.

▲

the__alchemist 6 months ago | parent | next [-]

Could you please clarify? I haven't noticed any impact to Heroku on my web applications; it.. just works, anecdotally. They send periodic mandatory upgrade emails re database and application stack, but they have been harmless so far; going back a decade.

▲

benoau 6 months ago | parent [-]

They went from leading / pioneering horizontal scalability and database deploying and scaling and orchestration to "quiet-quitting" 15 years ago and doing almost nothing ever since - today they're barely worthy of mention in any discussion on any tech that solves these problems.

▲

the__alchemist 6 months ago | parent [-]

We define worth. I mention Heroku as a great way to make web applications at various scales. Reliable, and easy to use.

▲

nine_k 6 months ago | parent [-]

Heroku was / is like a mainframe. You get provisioning, scaling, configuration, etc all sorted out for you, as long as you pay through the nose.

▲

ksec 6 months ago | parent | next [-]

I kind of think Railway and Fly.io is taking that spirit forward? Although I would love if Salesforce would just sell Heroku to somebody else and take it forward.

▲

the__alchemist 6 months ago | parent | prev [-]

So, it's a price issue? My confusion / out-of-the-loop is thus: I hear that Heroku has gone downhill, is no longer recommended etc, yet I've noticed no degradation personally. Works well, is stable, has all the benefits I initially liked it for.

	▲	benoau 6 months ago \| parent [-]
		I mean their prices have never been great, but the sadness is because they should have WON - Heroku should have been the standard for deploying and managing hosting instances and databases and stuff on any cloud.

▲

bigfatkitten 6 months ago | parent | prev [-]

Salesforce is what happened to Heroku.

▲

colechristensen 6 months ago | parent | prev | next [-]

It was always fun reimplementing some process 100x faster with a pipeline of grep and such on my laptop than somebody's hadoop cluster or whatever it was.

▲

Ferret7446 6 months ago | parent | prev [-]

I don't think you're agreeing to what the article is saying.

The article seems to be saying, instead of using CGI which spawns a process per request, to have a single Web server binary in Go/whatever. Which is totally reasonable and per my understanding what everyone already does nowadays (are any greenfield projects still using CGI?)

CGI is a "clever 'Unixy' hack" to add dynamicism to early web servers. They stopped being "relevant" a long time ago IMO.

In fact, I think your diatribe actually contradicts the article.

Basically, the article is saying that they went with the "simple" CGI approach which ended up creating more complexity than using the slightly more complex dedicated binary. The author essentially followed your advice which ended up causing more complexity and hacks.

The morale of the story is, you need to use the right tool for the job, and know when to switch. Sometimes that is the simple path, sometimes that is not.