| ▲ | brigade 2 hours ago | |
ARM favored wider ILP and mostly symmetric ALUs, while x86 favored wider and asymmetric ALUs Most high-end ARM cores were 4x128b FMA, and Cortex-X925 goes to 6x128b FMA. Contrast that to Intel that was 2x256b FMA for the longest, then 2x512b FMA, with another 1-2 pipelines that can't do FMA. But ultimately, 4x128b ≈ 2x256b, and 2x256b < 6x128b < 2x512b in throughput. Permute is a different factor though, if your algorithm cares about it. | ||