The associated issue for comparing two u16s is interesting.

https://github.com/rust-lang/rust/issues/140167

▲ ack_complete 4 months ago | parent | next [-]

I'm surprised there's no mention of store forwarding in that discussion. The -O3 codegen is bonkers, but the -O2 output is reasonable. In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure that would negate the benefit of merging the loads. In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.

▲

mshockwave 4 months ago | parent | next [-]

> In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure

It actually depends on the uArch, Apple silicon doesn't seem to have this restriction: https://news.ycombinator.com/item?id=43888005

> In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.

I guess you're talking about stores and load across function boundaries?

Trivia: X86 LLVM creates a whole Pass just to prevent this partial-store-to-load issue on Intel CPUs: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ...

▲

Dylan16807 4 months ago | parent | prev [-]

> In the case where one of the structs has just been computed, attempting to load it as a single 32-bit load can result in a store forwarding failure that would negate the benefit of merging the loads

Would that failure be significantly worse than separate loading?

Just negating the optimization wouldn't be much reason against doing it. A single load is simpler and in the general case faster.

	▲	ack_complete 4 months ago \| parent \| next [-]
		Usually, yeah, it's noticeably worse than using individual loads and stores as it adds around a dozen cycles of latency. This is usually enough for the load to light up hot in a sampling profile. It's possible for that extra latency to be hidden, but then in that case the extra loads/stores wouldn't be an issue either.
	▲	ycomb_anon 4 months ago \| parent \| prev [-]
		[dead]

▲ heybales 4 months ago | parent | prev | next [-]

The thing I like most about this is that the discussion isn't just 14 pages of "I'm having this issue as well" and "Any updates on when this will be fixed?" As a web dev, GitHub issues kinda suck.

▲

eterm 4 months ago | parent | next [-]

It was worse before emoji reactions were added and 90% of messages were literally just "+1"

	▲	heybales 4 months ago \| parent [-]
		+1

▲

NoMoreNicksLeft 4 months ago | parent | prev [-]

Wonder if it's a poor interface issue... if people could just click a button that says "me too" but didn't add a full comment but rather just added some minimal notation at the bottom of the comment that indicated their username, 1) would people use it and 2) would that be not overly-busy enough to not be annoying? It could even mute notifications for the me-toos.

	▲	IshKebab 3 months ago \| parent [-]
		This seems like an area where LLMs would actually be extremely useful. You can manually mark comments as irrelevant. Why can't GitHub use AI to do it automatically? Or to highlight the "resolution" comment automatically? On very big issues it can take a non-trivial amount of time just to find out what the outcome was.

▲ rhdjsjebshjffn 4 months ago | parent | prev [-]

This just seems to illustrate the complexity of compiler authorship. I am very sure c compilers are wble to address this issue any better in the general case.

▲ runevault 4 months ago | parent | next [-]

Keep in mind Rust is using the same backend as one of the main C compilers, LLVM. So if it is handling it any better that means the Clang developers handle it before it even reaches the shared LLVM backend. Well, or there is something about the way Clang structures the code that catches a pattern in the backend the Rust developers do not know about.

	▲	rhdjsjebshjffn 4 months ago \| parent [-]
		I mean yea, i just view rust as the quality-oriented spear of western development. Rust is absolutely an improvement over C in every way.

▲ vlovich123 4 months ago | parent | prev [-]

The rust issue has people trying this with c code and the compiler generates the same issue. This will get fixed and it’ll help c and Rust code

▲ runevault 4 months ago | parent [-]

Out of curiosity just clang or gcc as well?

	▲	josephg 4 months ago \| parent [-]
		I just tried it, and the problem is even worse in gcc. Given this C code: `typedef struct { uint16_t a, b; } pair; int eq_copy(pair a, pair b) { return a.a == b.a && a.b == b.b; } int eq_ref(pair a, pair b) { return a->a == b->a && a->b == b->b; }` Clang generates clean code for the eq_copy variant, but complex code for the eq_ref variant. Gcc emits pretty complex code in both variants. For example, here's eq_ref from gcc -O2: `eq_ref: movzx edx, WORD PTR [rsi] xor eax, eax cmp WORD PTR [rdi], dx je .L9 ret .L9: movzx eax, WORD PTR [rsi+2] cmp WORD PTR [rdi+2], ax sete al movzx eax, al ret` Have a play around: https://c.godbolt.org/z/79Eaa3jYf