▲ | stinkbeetle 14 hours ago | |
> I can't wrap my head around on how is it that triggering the L1 HW prefetcher so that it loads two pairs of cache-lines, from L2 into L1, can cause false-sharing. CPU0 stores to byte 0x10 and dirties CL0 (0x00-0x40). CPU1 loads byte 0x50 in a different data structure which is in CL1, and its adjacent line prefetcher also loads CL0, which is what Pentium 4 did. > Perhaps what fb experiment measured was the artifact of L2 HW prefetcher which takes advantage of 128-byte data layout: Seems plausible. |