Remix.run Logo
spockz 14 hours ago

I can imagine. Do you have complete automation setup around maintaining the cluster?

We are now on-prem using “pet” clusters with namespace as a service automated on it. This causes all kinds of issues with different workloads with different performance characteristics and requirements. They also share ingress and egress nodes so impact on those has a large blast radius. This leads to more rules and requirements.

Having dedicated and managed clusters where everyone can determine their sizing and granularity of workloads to deploy to which cluster is paradise compared to that.

solatic 13 hours ago | parent [-]

> This causes all kinds of issues with different workloads with different performance characteristics and requirements.

Most of these issues can be fixed by setting resource requests equal to limits and using integer CPU values to guarantee QoS. You should also have an interface with developers explaining which nodes in your datacenter have which characteristics, using node labels and taints, and force developers to pick specific node groups as such by specifying node affinity and tolerations, by not bringing online nodes without taints.

> They also share ingress and egress nodes so impact on those has a large blast radius.

This is true regardless of whether or not you use Kubernetes.