SUB has higher latency than XOR on some Intel CPUs:
latency (L) and throughput (T) measurements from the InstLatx64 project (https://github.com/InstLatx64/InstLatx64) :
| GenuineIntel | ArrowLake_08_LC | SUB r64, r64 | L: 0.26ns= 1.00c | T: 0.03ns= 0.135c |
| GenuineIntel | ArrowLake_08_LC | XOR r64, r64 | L: 0.03ns= 0.13c | T: 0.03ns= 0.133c |
| GenuineIntel | GoldmontPlus | SUB r64, r64 | L: 0.67ns= 1.0 c | T: 0.22ns= 0.33 c |
| GenuineIntel | GoldmontPlus | XOR r64, r64 | L: 0.22ns= 0.3 c | T: 0.22ns= 0.33 c |
| GenuineIntel | Denverton | SUB r64, r64 | L: 0.50ns= 1.0 c | T: 0.17ns= 0.33 c |
| GenuineIntel | Denverton | XOR r64, r64 | L: 0.17ns= 0.3 c | T: 0.17ns= 0.33 c |
I couldn't find any AMD chips where the same is true.