▲ | adwn 6 hours ago | |||||||
Yes, this can make sense if - the value is often doesn't require an update, and - there's contention on the cache line, i.e., at least two cores frequently read or write that cache line. But there are important details to consider: 1) The probing load must be atomic. Both the compiler and the processor in general are allowed to split non-atomic loads into two or more partial loads. Only atomic loads – even with relaxed ordering – are guaranteed to not return intermediate or mixed values from other atomic stores. 2) If the ordering on the read part of the atomic read-modify-write operation is not relaxed, the probing load must reflect this. For example, an acq-rel RMW op would require an acquire ordering on the probing read. | ||||||||
▲ | anematode 5 hours ago | parent [-] | |||||||
Thanks for your insights. (2) makes sense to me, but for (1), on ARM64 can an aligned 64-bit store really tear in a 64-bit non-atomic load? The spec says "A write that is generated by a store instruction that stores a single general-purpose register and is aligned to the size of the write in the instruction is single-copy atomic" (B2.2.1) | ||||||||
|