▲ | timewizard 20 hours ago | |
> That's just spilling with fewer steps. Another way to say this is it's "more efficient." > The executed uops should be the same. And "more densely coded." | ||
▲ | camel-cdr 20 hours ago | parent [-] | |
hm, I was wondering how the density compares with x86 having more complex encodings in general. vaddps zmm1,zmm0,ZMMWORD PTR [r14] takes six bytes to encode: 62 d1 7c 48 58 0e In SVE and RVV a load+add takes 8 bytes to encode. |