▲ | loeg a day ago | |||||||||||||
Early x86 prefetcher would fetch two adjacent cache lines, so despite a 64 byte physical size, in practice adjacent lines would cause false-sharing. This is mostly historical, though it's relatively common to use a 128 byte line size on x86, still. E.g., https://github.com/facebook/folly/blob/main/folly/lang/Align... (Sandy Bridge was a 2011 CPU). (Clang's impl of the std version of these constants uses 64 for both on x86: https://godbolt.org/z/r1fdYTWEn .) | ||||||||||||||
▲ | menaerus 15 hours ago | parent [-] | |||||||||||||
I can't wrap my head around on how is it that triggering the L1 HW prefetcher so that it loads two pairs of cache-lines, from L2 into L1, can cause false-sharing. Perhaps what fb experiment measured was the artifact of L2 HW prefetcher which takes advantage of 128-byte data layout:
| ||||||||||||||
|