Remix.run Logo
Neywiny 13 hours ago

What pattern in the data shows that's what's being measured? I would expect to see basically 0 latency between adjacent "cores" then since L1 is shared per thread?

monocasa 12 hours ago | parent [-]

Co resident threads might not get any speed up here since coherency instructions are functionally operations on the L2 cache.