Remix.run Logo
timewizard 20 hours ago

> That's just spilling with fewer steps.

Another way to say this is it's "more efficient."

> The executed uops should be the same.

And "more densely coded."

camel-cdr 20 hours ago | parent [-]

hm, I was wondering how the density compares with x86 having more complex encodings in general.

vaddps zmm1,zmm0,ZMMWORD PTR [r14]

takes six bytes to encode:

62 d1 7c 48 58 0e

In SVE and RVV a load+add takes 8 bytes to encode.