| ▲ | camel-cdr 5 days ago |
| nth_set_bit_u64: wouldn't that be __builtin_ctzll(_pdep_u64(1<<n, v)) with BMI2? |
|
| ▲ | kwillets 5 days ago | parent | next [-] |
| That's my guess as well. Bitstring rank/select is a well-known problem, and the BMI and non-BMI (Hacker's Delight) versions are available as a reference. |
|
| ▲ | SkiFire13 5 days ago | parent | prev | next [-] |
| That's assuming you're ok with your program not running on some older cpus. |
| |
| ▲ | zamadatix 5 days ago | parent [-] | | That and that you're not willing to entertain splitting the manual version as #[cfg(not(target_feature = "bmi2"))] fallback implementation. For something already down to ~ 1 ns both of those may well be very reasonable assumptions of course. | | |
| ▲ | Validark 5 days ago | parent [-] | | AMD machines prior to Zen 3 had a micro-coded implementation of pdep and pext, so they're actually relatively expensive for those earlier Zen machines (as well as Bulldozer). Some people still have Ryzen 3000 series chips. On the Intel side, pdep has been fast since its release with the Haswell in 2013, so pretty much everyone using Intel should be fine in this regard. |
|
|
|
| ▲ | stouset 5 days ago | parent | prev [-] |
| I believe the equivalent ARM64 instructions are in SVE2 which isn’t yet supported on Apple’s M-series chips as of M4, sadly. |