| ▲ | Someone an hour ago | |
> so instead it's piecemeal implementations mostly in numeric packages like eigen and lapack. Because that’s where the user-noticeable gains can be made. Using popcount in code you run once is going to shave off, maybe, 100 cycles. That isn’t worth the extra cycles of that approach. Also, FTA: “and arguably the whole scheme should be replaced by finer-grained feature detection”. Such feature detection would lead to a combinatorial explosion of different binaries. Finally, where it really matters, it’s not only a matter of recompiling the same code. For optimal performance, you also want to change loop unrolling strategy, stride count, etc. | ||
| ▲ | seddonm1 41 minutes ago | parent [-] | |
Based on the now-deprecated Clear Linux it does seem that these optimizations add up [0] and so maybe we should be considering them more broadly? [0] https://www.phoronix.com/review/clear-linux-48p-ubuntu/6 | ||