Remix.run Logo
fanf2 4 days ago

Apple’s ARM cores have wider decode than x86

M1 - 8 wide

M4 - 10 wide

Zen 4 - 4 wide

Zen 5 - 8 wide

adgjlsfhk1 3 days ago | parent | next [-]

pure decoder width isn't enough to tell you everything. X86 has some commonly used ridiculously compact instructions (e.g. lea) that would turn into 2-3 instructions on most other architectures.

ajross 3 days ago | parent | next [-]

The whole ModRM addressing encoding (to which LEA is basically a front end) is actually really compact, and compilers have gotten frightently good at exploiting it. Just look at the disassembly for some non-trivial code sometime and see what it's doing.

monocasa 3 days ago | parent | prev | next [-]

Additionally, stuff llike rmw instructions are really like at least three, maybe four or five risc instructions.

ack_complete 3 days ago | parent | prev | next [-]

Yes, but so does ARM. ld1 {v0.16b,v1.16b,v2.16b,v3.16b},x0,#64 loads 4 x 128-bit vector registers and post-increments a pointer register.

kimixa 3 days ago | parent | prev [-]

Also the op cache - if it hits that the decoder is completely skipped.

ryuuchin 3 days ago | parent | prev | next [-]

Is Zen 5 more like a 4x2 than a true 8 since it has dual decode clusters and one thread on a core can't use more than one?

https://chipsandcheese.com/i/149874010/frontend

wmf 4 days ago | parent | prev | next [-]

Skymont - 9 wide

mort96 3 days ago | parent | prev [-]

Wow, I had no idea we were up to 8 wide decoders in amd64 CPUs.