This is one of those way down the road optimizations for folks in fairly rare scale situations in fairly rare tight loops.

Most of us are in the realm of the lowest hanging fruit being database queries that could be 100x faster and functions being called a million times a day that only need to be called twice.

▲

stego-tech 3 days ago | parent | next [-]

100% with you there. I can count one time in my entire 15 years where I had to pin a production workload for performance, and it was Hyperion.

In 99% of use cases, there’s other, easier optimizations to be had. You’ll know if you’re in the 1% workload pinning is advantageous to.

For everyone else, it’s an excellent explainer why most guides and documentation will sternly warn you against fussing with the NUMA scheduler.

	▲	toast0 3 days ago \| parent [-]
		> In 99% of use cases, there’s other, easier optimizations to be had. You’ll know if you’re in the 1% workload pinning is advantageous to. Cpu pinning can be super easy too. If you have an application that uses the whole machine, you probably already spawn one thread per cpu thread. Pinning those threads is usually pretty easy. Checking if it makes a difference might be harder... For most applications, it won't make a big difference, but some applications will see a big difference. Usually a positive difference, but it depends on the application. If nobody has tried cpu pinning your application lately, it's worth trying. Of course, doing something efficiently is nice, but not doing it is often a lot faster... Not doing things that don't need to be done has huge potential speedups. If you want to cpu pin network sockets, that's not as easy, but it can also make a big difference in some circumstances; mostly if you're a load balancer/proxy kind of thing where you don't spend much time processing packets, just receive and forward. In that case, avoiding cross cpu reads and writes can provide huge speedups, but it's not easy. That one, yeah, only do it if you have a good idea it will help, it's kind of invasive and it won't be noticable if you do a lot of work on requests.

▲

bboreham 3 days ago | parent | prev | next [-]

Whilst you’re right in broad strokes, I would observe that “the garbage-collector” is one of those tight loops. Single-threaded JavaScript is perhaps one of the best defences against NUMA, but anyone running a process on multiple cores and multiple gigabytes should at least know about the problem.

▲

frollogaston 3 days ago | parent | prev [-]

Yeah, I was once in this situation with a perf-focused software defined networking project. Pinning to the wrong NUMA node slowed it down badly.

Probably another situation is if you're working on a DBMS itself.