Remix.run Logo
lukax an hour ago

NUMA can cause really crappy performance. We deployed a Go based LLM gateway in Kubernetes deployed on a server with hundreds of CPU cores. We didn't explicitly set GOMAXPROCS so Go runtime scheduled goroutines over different CPUs and it constantly used 200% CPU and GC was causing latency spikes. Then we set GOMAXPROCS 8 and all performance issues went away. Until recently Kubernetes didn't work well with NUMA.

re-thc 29 minutes ago | parent [-]

Is this on AMD? I wonder if it's all to do with NUMA or their CCD architecture etc (well these days Intel and everyone also does it to some extent).

toast0 6 minutes ago | parent | next [-]

Hundreds of cores is likely two sockets and so you've got NUMA there.

Scaling to large core counts has a lot of gotchas.

7 minutes ago | parent | prev [-]
[deleted]