Remix.run Logo
spockz 3 months ago

I can imagine. Do you have complete automation setup around maintaining the cluster?

We are now on-prem using “pet” clusters with namespace as a service automated on it. This causes all kinds of issues with different workloads with different performance characteristics and requirements. They also share ingress and egress nodes so impact on those has a large blast radius. This leads to more rules and requirements.

Having dedicated and managed clusters where everyone can determine their sizing and granularity of workloads to deploy to which cluster is paradise compared to that.

solatic 3 months ago | parent [-]

> This causes all kinds of issues with different workloads with different performance characteristics and requirements.

Most of these issues can be fixed by setting resource requests equal to limits and using integer CPU values to guarantee QoS. You should also have an interface with developers explaining which nodes in your datacenter have which characteristics, using node labels and taints, and force developers to pick specific node groups as such by specifying node affinity and tolerations, by not bringing online nodes without taints.

> They also share ingress and egress nodes so impact on those has a large blast radius.

This is true regardless of whether or not you use Kubernetes.

spockz 3 months ago | parent [-]

For the different workloads it is more that all the nodes in the cluster are the same and mixing memory with cpu intensive or io intensive workloads is hard to schedule or to get to a proper utilisation rate. Next to that when indeed setting request and limit properly it means that our Java apps use/reserve multiple cores even when handling little traffic and basically idling. Golang apps scale better there, especially towards 0.

When running on “bare” VMs each VM is its own member in the network. The pods in the cluster use an overlay network and egress is limited to egress nodes which are now shared by all workloads.

Having dedicated K8s clusters would reduce the sharing of network ingress and egress as well as choose the vm size for my workloads.