▲ | comex 7 hours ago | |||||||
Incidentally, this automatic branch-if-zero from LLVM is being improved. First of all, a recent LLVM patch apparently changes codegen to use CMOV instead of a branch: https://github.com/llvm/llvm-project/pull/102885 Beyond that, Intel recently updated their manual to retroactively define the behavior of BSR/BSF on zero inputs: it leaves the destination register unmodified. This matches the AMD manual, and I suspect it matches the behavior of all existing x86-64 processors (but that will need to be tested, I guess). If so, you don't need either a branch or CMOV. Just set a register to 32, then run BSR with the same register as destination. If the BSR input is nonzero, the 32 is overwritten with the trailing-zero count. If the BSR input is zero, then BSR leaves the register unmodified and you get 32. Since this behavior is now guaranteed for future x86-64 processors, and assuming it's indeed compatible with all existing x86-64 processors (maybe even all x86 processors period?), LLVM will no longer need the old path regardless of what it's targeting. Note that if you're targeting a newer x86-64 version, LLVM will just emit TZCNT, which just does what you'd expect and returns 32 if the input is zero (or 64 for a 64-bit TZCNT). But as the blog post demonstrates, many people still build for baseline x86_64. (Intel does document one discrepancy between processors: "On some older processors, use of a 32-bit operand size may clear the upper 32 bits of a 64-bit destination while leaving the lower 32 bits unmodified.") | ||||||||
▲ | hinkley 5 hours ago | parent [-] | |||||||
I was watching a video ranting about bad benchmarks yesterday and in an aside they pointed out the (gcc) generated code used Conditional Move (cmov) in several places to handle and if/else if in the code with no branches. I think the days of trying to branches by trying to remove conditional assignments are either gone or close to it. You may still have a subsequent data race, but the conditional assignment isn't your biggest problem with throughput. | ||||||||
|