▲ | adrian_b 6 months ago | |||||||
While Intel has LZCNT and TZCNT (leading-zero count and trailing-zero count), which replace the wrongly-defined BSR and BSF, only since Haswell (June 2013), AMD has LZCNT since Barcelona (September 2007) and TZCNT since Piledriver (May 2012). The author has made the mistake of not using the right compilation options for the CPU, in order to enable the use of LZCNT and TZCNT, because it is very likely that the author uses a CPU that supports these instructions, unless it is an older Intel Atom CPU, up to Tremont. Had the author compiled correctly the program, there should not have been any branches since the beginning. When Intel has added the BSF and BSR instructions in 1985 to 80386, they have made a very serious mistake in their definition, despite the fact that they should have followed the example of much older ISAs, where these instructions were defined correctly. AMD has defined LZCNT and TZCNT in order to correct Intel's mistake, but in order to ensure backward compatibility, the corrected instructions use an additional prefix that is ignored by older CPUs, instead of using a new encoding. This makes the encoding of these instructions much longer than it should be. | ||||||||
▲ | hn3er1q 6 months ago | parent [-] | |||||||
Interesting. What compiler options would you have used? Do you know if the options are applicable for ARM as well? | ||||||||
|