Remix.run Logo
yvdriess 3 days ago

Something that could help is to use llvm-mca or similar to get an idea of the potential speedup.

Sesse__ 3 days ago | parent [-]

A basic block simulator like llvm-mca is unlikely to give useful information here, as memory access is going to play a significant part in the overall performance.