Remix.run Logo
wetpaste 12 hours ago

RE: slow autoscaling

Maybe the cloud companies could do something here by always keeping a small subset of machines online and ready to join the cluster. Provided there is some compromise in what the configuration is for the end user. I guess it doesn't solve image pulling. Pre-warming nodes is an annoying problem to solve.

Best solution I've been able to come up with is: Spegel (lightweight p2p image caching) + Karpenter (dynamic node autoscaling) + pods with low priority to hold onto some extra nodes. It's not perfect though

p_l 3 hours ago | parent [-]

1. Do some capacity planning

2. Apply appropriate changes to application resources (like parameters for spreading pods around)

3. Add descheduler[1] or similar tool to force redistribution of pods

4. Configure your cluster autoscaling params according to values from step (1) and have it autoscale before nodes are too heavily loaded.