| ▲ | tanelpoder 7 hours ago | ||||||||||||||||||||||
I understand that it's the interrupt-based I/O completion workloads that suffered from IOMMU overhead in your tests? IOMMU may induce some interrupt remapping latency, I'd be interested in seeing: 1) interrupt counts (normalized to IOPS) from /proc/interrupts 2) "hardirqs -d" (bcc-tools) output for IRQ handling latency histograms 3) perf record -g output to see if something inside interrupt handling codepath takes longer (on bare metal you can see inside hardirq handler code too) Would be interesting to see if with IOMMU each interrupt handling takes longer on CPU (or is the handling time roughly the same, but interrupt delivery takes longer). There may be some interrupt coalescing thing going on as well (don't know exactly what else gets enabled with IOMMU). Since interrupts are raised "randomly", independently from whatever your app/kernel code is running on CPUs, it's a bit harder to visualize total interrupt overhead in something like flamegraphs, as the interrupt activity is all over the place in the chart. I used flamegraph search/highlight feature to visually identify how much time the interrupt detours took during stress test execution. Example here (scroll down a little): https://tanelpoder.com/posts/linux-hiding-interrupt-cpu-usag... | |||||||||||||||||||||||
| ▲ | eivanov89 6 hours ago | parent | next [-] | ||||||||||||||||||||||
BTW, the whole situation with IRQ accounting disabled reminds me the -fomit-frame-pointer case. For a long time there was no practical performance reason, but the option had been used... Making slower and harder to build stacks both for perf analyses and for stack unwinding in languages like C++. After careful reading I'm surprised how small IRQ squares build up 30%. Should search for interrupts when I inspect our flamegraphs next time. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | eivanov89 7 hours ago | parent | prev [-] | ||||||||||||||||||||||
Unfortunately, we don't have a proper measurements for IOPOLL mode with and without IOMMU, because initially we didn't configure IOPOLL properly. However, I bet that this mode will be affected as well, because disk still has to write using IOMMU. You suggest a very interesting measurements. I will keep it in my mind and try during next experiments. Wish I have read this before to apply during the past runs :) | |||||||||||||||||||||||
| |||||||||||||||||||||||