Doing this at anything > 1k nodes is a pain in the butt. We decided to run many <100 nodes clusters rather than a few big ones.

▲

kvrty 5 hours ago | parent | next [-]

Same here. Non Kubernetes project originated control plane components start failing beyond a certain limit - your ingress controllers, service meshes etc. So I don't usually take node numbers from these benchmarks seriously for our kind of workloads. We run a bunch of sub-1k node clusters.

▲

liveoneggs 4 hours ago | parent | prev | next [-]

Same. The control plane and various controllers just aren't up to the task.

▲

preisschild 2 hours ago | parent | prev [-]

Meh, I've had had clusters with close to 1k nodes (w/ cilium as CNI) and didnt have major issues

▲

__turbobrew__ an hour ago | parent [-]

When I was involved about a year ago, cilium falls apart at around a few thousand nodes.

One of the main issues of cilium is that the bpf maps scale with the number of nodes/pods in the cluster, so you get exponential memory growth as you add more nodes with the cilium agent on them. https://docs.cilium.io/en/stable/operations/performance/scal...

	▲	oasisaimlessly 38 minutes ago \| parent [-]
		Wouldn't that be quadratic rather than exponential?