Remix.run Logo
amluto 3 hours ago

> I suspect the problem isn't so much the lock prefix, but that the non-futex spinlock release just is a store, whereas a futex release has to be a RMW operation.

> I'm talking out of my ass here, but my guess is that the reason for the performance gain of the plain-store-is-a-spinlock-release on x86 comes from being able to do the release via the store buffer, without having to wait for exclusive ownership of the cache line.

I don’t think so. The CPU is pretty good about hiding that kind of latency — reading a contended cache line and doing a correctly predicted branch shouldn’t stall anything after it.

But LOCK and MFENCE are quite expensive.