Remix.run Logo
bayindirh 2 days ago

You can always go through cachegrind or perf and see what happens with your code.

I managed to reach practical IPC limits of the hardware I was running on, and while I could theoretically make prefetcher happier with some matrix reordering, looking back, I'm not sure how much performance it provided since the FPU was already saturated at that point.