Remix.run Logo
dzaima 20 hours ago

Intel still shares ports between vector and scalar on P-cores; a scalar multiply in the loop will definitely fight with a vector port, and the bits of pointer bumps and branch and whatnot can fill up the 1 or 2 scalar-only ports. And maybe there are some minor power savings from wasting resources on the scalar overhead. Still, clang does unroll way too much.

Remnant44 19 hours ago | parent [-]

My understanding is that they've changed this for Lion Cove and all future P cores, moving to much more of a Zen-like setup with seperate schedulers and ports for vector and scalar ops.

dzaima 18 hours ago | parent [-]

Oh, true, mistook it for an E-core while clicking through diagrams due to the port spam.. Still, that's a 2024 microarchirecture, it'll be like a decade before it's reasonable to ignore older ones.