Remix.run Logo
ot 4 days ago

I'm not sure which comment you're responding to, because I'm not talking about shared_ptr, but about how atomic operations in general are implemented on x86.

I don't believe that shared_ptr uses seq-cst because I can just look at the source code, and I know that inc ref is relaxed and dec ref is acq-rel, as they should be.

However, none of this makes a difference on x86, where RMW atomic operations all lower to the same instructions (like LOCK ADD). Loads also do not care about memory order, and stores sometimes do, and that was what my comment was about.

tialaramex 3 days ago | parent [-]

This thread is wondering why the MP shared_ptr is slower than SP shared_ptr, or in Rust where this distinction isn't compiler magic, why Arc is slower than Rc

So hence the sequentially consistent ordering doesn't come into the picture.

And yeah, no, you don't get the sequentially consistent ordering for free on x86. x86 has the total store order, but firstly that's not quite enough to deliver sequentially consistent semantics in the machine on its own and then also the compiler has barriers during optimisation and those are impacted too. So if you insist on this ordering (which to be clear again you almost never should, the fact it's the default in C++ is IMO a mistake) it does make a difference on x86.