▲ | yshui 14 hours ago | ||||||||||||||||
That's a cool find. I wonder if LLVM also does the other way around operation, where it pattern matches handwritten CAS loops and transform them into native ARM64 instructions. | |||||||||||||||||
▲ | tialaramex 3 hours ago | parent | next [-] | ||||||||||||||||
The term of art for this technique is "idiom recognition" and it's proper ancient, like, APL compilers did have some idiom recognition 50+ years ago. An example you'll see in say a modern C compiler is that if you write the obvious loop to calculate how many bits are set in an int, the actual machine code on a brand new CPU should be a single population count instruction, C provides neither intrinsics (like Rust) not a dedicated "popcount" feature, so you can't write that but it's obviously what you want here and yup an optimising C compiler will do that. However, LLVM is dealing with an IR generated by other compiler folk so I think it probably has less use for idiom recognition. Clang would do the recognition and lower to the same LLVM IR as Rust does for its intrinsic population count core::intrinsics::ctpop so the LLVM backend doesn't need to spot this. I might be wrong, but I think that's how it works. | |||||||||||||||||
| |||||||||||||||||
▲ | jerrinot 13 hours ago | parent | prev | next [-] | ||||||||||||||||
That's a very good question. A proper compiler engineer would know, but I will do my best to find something and report back. Edit: I could not find any pass with a pattern matching to replace CAS loops. The closest thing I could find is this pass: https://github.com/llvm/llvm-project/blob/06fb26c3a4ede66755... I reckon one could write a similar pass to recognize CAS idioms, but its usefulness would be probably rather limited and not worth the effort/risks. | |||||||||||||||||
▲ | Arnavion 8 hours ago | parent | prev [-] | ||||||||||||||||
I checked Godbolt, with RISC-V instead of ARM since I'm more familiar with that, and it doesn't look like it. https://gcc.godbolt.org/z/b5s4WjnTG (amomax is the atomic fetch-max instruction. lr and sc are load-reserved and store-conditional instructions; sc is like a regular store except it only succeeds if the address was not modified since the previous lr that accessed it. IOW the assembly is basically one-to-one with the C source.) |