Remix.run Logo
ack_complete 5 months ago

Sure, but any core-wide downclocking effect at all is annoying for autovectorization, since a small local win easily turns into a global loss. Which is why compilers have "prefer vector width" tuning parameters so autovec can be tuned down to avoid 512-bit or even 256-bit ops.

This is also the same reason that having AVX-512 only on the P-cores wouldn't have worked, even with thread director support. It would only take one small routine in a common location to push most threads off the P-cores.

I'm of the opinion that Intel's hybrid P/E-arch has been mostly useless anyway and only good for winning benchmarks. My current CPU has a 6P4E configuration and the scheduler hardly uses the E-cores at all unless forced, plus performance was better and more stable with the E-cores disabled.