| ▲ | anal_reactor 12 hours ago | |||||||||||||||||||||||||
Can someone explain to me what's the problem? I have very little knowledge of Linux kernel, but I'm curious. I've tried reading a little, but it's jargon over jargon. | ||||||||||||||||||||||||||
| ▲ | alienchow 11 hours ago | parent | next [-] | |||||||||||||||||||||||||
I'm not familiar with the jargon either, but based on some reading it comes down to how the latest kernel treats process preempts. Postgres uses spinlocks to hold shared memory for very critical processes. Spinlocks are an infinite loop with no sleep to attempt to hold a lock, thus "spinning". Previous kernels allowed spinlocking processes to run with PREEMPT_NONE. This flag tells the kernel to let the locking process complete their work before doing anything. Now the latest kernel removed this functionality and is interrupting spinlocking processes. So if a process that is holding a lock gets interrupted, all other postgres spinlocks processes that need the same lock spin in place for way longer times, leading to performance degradation. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | tijsvd 10 hours ago | parent | prev [-] | |||||||||||||||||||||||||
From what I understand in the follow up: postgres uses shared memory for buffers. This shared memory is read by a new connection while locked. In postgres, connections are handled with a process fork, not a new thread. If such a fork first reads memory, even if it already exists, that causes a minor page fault, which goes back to the kernel so it can update memory mapping tables. The operation under lock is only a few instructions, but if it takes longer than expected, then that causes lock contention. Regression in the kernel handling minor faults? The whole thing is then made worse because it's a spinlock, causing all waiting processes to contend over the cpus which adds to kernel processing. Mitigated by using huge pages, which dramatically reduces the number of mapping entries and faults. I reckon that it could also be mitigated in postgres by pre-faulting all shared memory early? | ||||||||||||||||||||||||||