▲ | mshockwave 16 hours ago | |
> In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure It actually depends on the uArch, Apple silicon doesn't seem to have this restriction: https://news.ycombinator.com/item?id=43888005 > In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable. I guess you're talking about stores and load across function boundaries? Trivia: X86 LLVM creates a whole Pass just to prevent this partial-store-to-load issue on Intel CPUs: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ... |