Remix.run Logo
eigenform 3 days ago

Don't know this for certain, but I always assumed that x86 implementations get away with this by predecoding cachelines. If you're going to do prefetching in parallel and decoupled from everything else, might as well move part of the work there too? (obviously not without cost - plus, you can identify branches early!)