Remix.run Logo
gpderetta 13 hours ago

Very interesting. AFAIK the kernel explicitly gives consume semantics to read_once (and in fact it is not just a compiler barrier on alpha), so technically lowering it to a relaxed operation is wrong.

Does rust have or need the equivalent of std::memory_order_consume? Famously this was deemed unimplementable in C++.

steveklabnik 13 hours ago | parent | next [-]

It wasn’t implemented for the same reason. Rust uses C++20 ordering.

gpderetta 13 hours ago | parent | next [-]

right, so I would expect that the equivalent of READ_ONCE is converted to an acquire in rust, even if slightly pessimal.

But the article says that the suggestion is to convert them to relaxed loads. Is the expectation to YOLO it and hope that the compiler doesn't break control and data dependencies?

bonzini 13 hours ago | parent [-]

There is a yolo way that actually works, which would be to change it to a relaxed load followed by an acquire signal fence.

loeg 13 hours ago | parent [-]

Is that any better than just using an acquire load?

gpderetta 13 hours ago | parent [-]

It is cheaper on ARM and POWER. But I'm not sure it is always safe. The standard has very complex rules for consume to make sure that the compiler didn't break the dependencies.

edit: and those rules where so complex that compilers decided where not implementable or not worth it.

bonzini 9 hours ago | parent [-]

The rules were there to explain what optimizations remained possible. Here no optimization is possible at the compiler level, and only the processor retains freedom because we know it won't use it.

It is nasty, but it's very similar to how Linux does it (volatile read + __asm__("") compiler barrier).

comex 8 hours ago | parent | next [-]

This is still unsound (in both C and Rust), because the compiler can break data dependencies by e.g. replacing a value with a different value known to be equal to it. A compiler barrier doesn't prevent this. (Neither would a hardware barrier, but with a hardware barrier it doesn't matter if data dependencies are broken.) The difficulty of ensuring the compiler will never break data dependencies is why compilers never properly implemented consume. Yet at the same time, this kind of optimization is actually very rare in non-pathological code, which is why Linux has been able to get away with assuming it won't happen.

gpderetta 6 hours ago | parent | prev [-]

In principle a compiler could convert the data dependency into to a control dependency (for example, after PGO after checking against the most likely value), and those are fairly fragile.

I guess in practice mainstream compilers do not do it and relaxed+signal fence works for now, but the fact that compilers have been reluctant to use it to implement consume means that they are reluctant to commit to it.

In any case I think you work on GCC, so you probably know the details better than me.

edit: it seems that ARM specifically does not respect control dependencies. But I might misreading the MM.

Fulgen 9 hours ago | parent | prev [-]

C++20 actually [changed the semantics of consume](https://devblogs.microsoft.com/oldnewthing/20230427-00/?p=10...), but Rust doesn't include it. And last I remember compilers still treat it as acquire, so it's not worth the bytes it's stored in.

jcranmer 7 hours ago | parent [-]

In the current drafts of C++ (I don't know which version it landed in), memory_order::consume is fully dead and listed as deprecated in the standard.

loeg 13 hours ago | parent | prev [-]

Does anything care about Alpha? The platform hasn't been sold in 20 years.

jcranmer 12 hours ago | parent | next [-]

It's a persistent misunderstanding that release-consume is about Alpha. It's not; in fact, Alpha is one of the few architectures where release-consume doesn't help.

In a TSO architecture like x86 or SPARC, every "regular" memory load/store is effectively a release/acquire by default. Using release/consume or relaxed provides no extra speedup on these architectures. In weak memory models, you need to add in acquire barriers to get release/acquire architectures. But also, most weak memory models have a basic rule that a data-dependent load has an implicit ordering dependency on the values that computed it (most notably, loading *p has an implicit dependency on p).

The goal of release/consume is to be able to avoid having an acquire fence if you have only those dependencies--to promote a hardware data dependency semantic rule to a language-level semantic rule. For Alpha's ultra-weak model, you still need the acquire fence in this mode, it doesn't help Alpha one whit. Unfortunately, for various reasons, no one has been able to work out a language-level semantics for consume that compilers are willing to implement (preserving data dependencies through optimizations is a lot more difficult than it appears), so all compilers have remapped consume to acquire, making it useless.

gpderetta 13 hours ago | parent | prev [-]

consume is trivial on alpha, it is the same as acquire (always needs a #LoadLoad). It is also the same as acquire (and relaxed) on x86 and SPARC (a plain load, #LoadLoad is always implied).

The only place where consume matters is on relaxed but not too relaxed architectures like ARM and POWER, where consume relies on the implicit #LoadLoad of controls and data dependencies.

bonzini 12 hours ago | parent [-]

Also on alpha there's only store-store and full memory barriers. Acquire is very expensive.

gpderetta 6 hours ago | parent [-]

Indeed. On the other hand recently ARM has added explicit load acquires primitives which are relatively cheap, so converting a consume to an acquire is not a big loss (and Linus considered doing it for the kernel a while ago just to avoid having to think too hard about compiler optimizations).