Remix.run Logo
dikei 14 hours ago

Even for on-prem scenario, I'd rather maintain a K8S control plane and let developer teams manage their own apps deployment in their own little namespace, than provisioning a bunch of new VMs each time a team need some services deployed.

mzhaase 9 hours ago | parent | next [-]

This for me is THE reason for using container management. Without containers, you end up with hundreds of VMs. Then, when the time comes that you have to upgrade to a new OS, you have to go through the dance, for every service:

- set up new VMs

- deploy software on new VMs

- have the team responsible give their ok

It takes forever, and in my experience, often never completes because some snowflake exists somewhere, or something needs a lib that doesn't exist on the new OS. VMs decouple the OS from the hardware, but you should still decouple the service from the OS. So that means containers. But then managing hundreds of containers still sucks.

With container management, I just

- add x new nodes to cluster

- drain x old nodes and delete them

rtpg 13 hours ago | parent | prev | next [-]

Even as a K8s hater, this is a pretty salient point.

If you are serious about minimizing ops work, you can make sure people are deploying things in very simple ways, and in that world you are looking at _very easy_ deployment strategies relative to having to wire up VMs over and over again.

Just feels like lots of devs will take whatever random configs they find online and throw them over the fence, so now you just have a big tangled mess for your CRUD app.

guitarbill 13 hours ago | parent | next [-]

> Just feels like lots of devs will take whatever random configs they find online

Well it usually isn't a mystery. Requiring a developer team to learn k8s likely with no resources, time, or help is not a recipe for success. You might have minimised someone else's ops work, but at what cost?

rtpg 13 hours ago | parent [-]

I am partly sympathetic to that (and am a person who does this) but I think too many devs are very nihilistic and use this as an excuse to stop thinking. Everyone in a company is busy doing stuff!

There's a lot of nuance here. I think ops teams are comfortable with what I consider "config spaghetti". Some companies are incentivised to ship stuff that's hard to configure manually. And a lot of other dynamics are involved.

But at the end of the day if a dev copy-pastes some config into a file, taking a quick look over and asking yourself "how much of this can I actually remove?" is a valuable skill.

Really you want the ops team to be absorbing this as well, but this is where constant atomization of teams makes things worse! Extra coordination costs + a loss of a holistic view of the system means that the iteration cycles become too high.

But there are plenty of things where (especially if you are the one integrating something!) you should be able to look over a thing and see, like, an if statement that will always be false for your case and just remove it. So many modern ops tools are garbage and don't accept the idea of running something on your machine, but an if statement is an if statement is an if statement.

dikei 13 hours ago | parent | prev [-]

> Just feels like lots of devs will take whatever random configs they find online and throw them over the fence, so now you just have a big tangled mess for your CRUD app.

Agree.

To reduce the chance a dev pull some random configs out of nowhere, we maintain a Helm template that can be used to deploy almost all of our services in a sane way, just replace the container image and ports. The deployment is probably not optimal, but further tuning can be done after the service is up and we have gathered enough metrics.

We've also put all our configs in one place, since we found that devs tend to copy from existing configs in the repo before searching the internet.

spockz 14 hours ago | parent | prev | next [-]

I can imagine. Do you have complete automation setup around maintaining the cluster?

We are now on-prem using “pet” clusters with namespace as a service automated on it. This causes all kinds of issues with different workloads with different performance characteristics and requirements. They also share ingress and egress nodes so impact on those has a large blast radius. This leads to more rules and requirements.

Having dedicated and managed clusters where everyone can determine their sizing and granularity of workloads to deploy to which cluster is paradise compared to that.

solatic 13 hours ago | parent [-]

> This causes all kinds of issues with different workloads with different performance characteristics and requirements.

Most of these issues can be fixed by setting resource requests equal to limits and using integer CPU values to guarantee QoS. You should also have an interface with developers explaining which nodes in your datacenter have which characteristics, using node labels and taints, and force developers to pick specific node groups as such by specifying node affinity and tolerations, by not bringing online nodes without taints.

> They also share ingress and egress nodes so impact on those has a large blast radius.

This is true regardless of whether or not you use Kubernetes.

DanielHB 10 hours ago | parent | prev [-]

> than provisioning a bunch of new VMs each time a team need some services deployed.

Back in the old days before cloud providers this was the only option. I started my career in early 2010s and got the tailend of this, it was not fun.

I remember my IT department refusing to set up git for us (we were using SVN before) so we just asked a VM and set up a git repo in there ourselves to host our code.