Remix.run Logo
mmastrac a day ago

The associated issue for comparing two u16s is interesting.

https://github.com/rust-lang/rust/issues/140167

ack_complete a day ago | parent | next [-]

I'm surprised there's no mention of store forwarding in that discussion. The -O3 codegen is bonkers, but the -O2 output is reasonable. In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure that would negate the benefit of merging the loads. In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.

mshockwave 17 hours ago | parent | next [-]

> In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure

It actually depends on the uArch, Apple silicon doesn't seem to have this restriction: https://news.ycombinator.com/item?id=43888005

> In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.

I guess you're talking about stores and load across function boundaries?

Trivia: X86 LLVM creates a whole Pass just to prevent this partial-store-to-load issue on Intel CPUs: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ...

Dylan16807 17 hours ago | parent | prev [-]

> In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure that would negate the benefit of merging the loads

Would that failure be significantly worse than separate loading?

Just negating the optimization wouldn't be much reason against doing it. A single load is simpler and in the general case faster.

heybales a day ago | parent | prev | next [-]

The thing I like most about this is that the discussion isn't just 14 pages of "I'm having this issue as well" and "Any updates on when this will be fixed?" As a web dev, GitHub issues kinda suck.

eterm a day ago | parent | next [-]

It was worse before emoji reactions were added and 90% of messages were literally just "+1"

heybales a day ago | parent [-]

+1

NoMoreNicksLeft 6 hours ago | parent | prev [-]

Wonder if it's a poor interface issue... if people could just click a button that says "me too" but didn't add a full comment but rather just added some minimal notation at the bottom of the comment that indicated their username, 1) would people use it and 2) would that be not overly-busy enough to not be annoying? It could even mute notifications for the me-toos.

rhdjsjebshjffn a day ago | parent | prev [-]

This just seems to illustrate the complexity of compiler authorship. I am very sure c compilers are wble to address this issue any better in the general case.

runevault a day ago | parent | next [-]

Keep in mind Rust is using the same backend as one of the main C compilers, LLVM. So if it is handling it any better that means the Clang developers handle it before it even reaches the shared LLVM backend. Well, or there is something about the way Clang structures the code that catches a pattern in the backend the Rust developers do not know about.

rhdjsjebshjffn 20 hours ago | parent [-]

I mean yea, i just view rust as the quality-oriented spear of western development.

Rust is absolutely an improvement over C in every way.

vlovich123 a day ago | parent | prev [-]

The rust issue has people trying this with c code and the compiler generates the same issue. This will get fixed and it’ll help c and Rust code

runevault 20 hours ago | parent [-]

Out of curiosity just clang or gcc as well?

josephg 8 hours ago | parent [-]

I just tried it, and the problem is even worse in gcc.

Given this C code:

    typedef struct { uint16_t a, b; } pair;

    int eq_copy(pair a, pair b) {
        return a.a == b.a && a.b == b.b;
    }
    int eq_ref(pair *a, pair *b) {
        return a->a == b->a && a->b == b->b;
    }
Clang generates clean code for the eq_copy variant, but complex code for the eq_ref variant. Gcc emits pretty complex code in both variants.

For example, here's eq_ref from gcc -O2:

    eq_ref:
        movzx   edx, WORD PTR [rsi]
        xor     eax, eax
        cmp     WORD PTR [rdi], dx
        je      .L9
        ret
    .L9:
        movzx   eax, WORD PTR [rsi+2]
        cmp     WORD PTR [rdi+2], ax
        sete    al
        movzx   eax, al
        ret
Have a play around: https://c.godbolt.org/z/79Eaa3jYf