| ▲ | bayindirh 2 days ago | |
You can always go through cachegrind or perf and see what happens with your code. I managed to reach practical IPC limits of the hardware I was running on, and while I could theoretically make prefetcher happier with some matrix reordering, looking back, I'm not sure how much performance it provided since the FPU was already saturated at that point. | ||