| ▲ | torginus 3 days ago |
| There are just so many confounding factors that it's almost entirely impossible to pin down what's going on. - M-series chips have closely integrated RAM right next to the CPU, while AMD makes do with standard DDR5 far away from the CPU, which leads to a huge latency increase - I wouldn't be surprised if Apple CPUs (which have a mobile legacy) are much more efficient/faster at 'bursty' workloads - waking up, doing some work and going back to sleep - M series chips are often designed for a lower clock frequency, and power consumption increases quadratically (due to capactive charge/dischargelosses on FETs) Here's a diagram that shows this on a GPU: https://imgur.com/xcVJl1h So while it's entirely possible that AArch64 is more efficient (the decode HW is simpler most likely, and encoding efficiency seems identical): https://portal.mozz.us/gemini/arcanesciences.com/gemlog/22-0...? It's hard to tell how much that contributes to the end result. |
|
| ▲ | magicalhippo 3 days ago | parent | next [-] |
| Zen 5 also seems to have a bit of a underperforming memory subsystem, from what I can gather. Hardware Unboxed just did an interesting video[1] comparing gaming performance of 7600X Zen 4 and 9700X Zen 5 processors, and also the 9800X3D for reference. In some games the 9700X Zen 5 had a decent lead over the Zen 4, but in others it had exactly the same performance. But the 9800X3D would then have a massive lead over the 9700X. For example, in Horizon Zero Dawn benchmark, the 7600X had 182 FPS while the 9700X had 185 FPS, yet the 9800X3D had a massive 266 FPS. [1]: https://www.youtube.com/watch?v=emB-eyFwbJg |
| |
| ▲ | VHRanger 3 days ago | parent [-] | | I mean, huge software with a ton of quirks like a AAA video game are arguably not a good benchmark to understand hardware. They're still good benchmarks IMO because they represent a "real workload" but to understand why the 9800X3D performs this much better you'd want some metrics on CPU cache misses in the processors tested. It's often similar to hyperthreading -- on very efficient sofware you actually want to turn SMT off sometimes because it causes too many cache evictions as two threads fight for the same L2 cache space which is efficiently utilized. So software having a huge speedup from a X3D model with a ton of cache might indicate the sofware has a bad data layout and needs the huge cache because it keeps doing RAM round trips. You'd presumably also see large speedups in this case from faster RAM on the same processor. | | |
| ▲ | magicalhippo 3 days ago | parent [-] | | > but to understand why the 9800X3D performs this much better you'd want some metrics on CPU cache misses in the processors tested. But as far as I can tell the 9600X and the 9800X3D are the same except for the 3D cache and a higher TDP. However they have similar peak extended power (~140W) and I don't see how the different TDP numbers explain the differences between 9600X and 7600X where the is sometimes ahead and other times identical, while the 9800X3D beats both massively regardless. What other factors could it be besides fewer L3 cache misses that lead to 40+% better performance of the 9800X3D? > You'd presumably also see large speedups in this case from faster RAM on the same processor. That was precisely my point. The Zen 5 seems to have a relatively slow memory path. If the M-series has a much better memory path, then the Zen 5 is at a serious disadvantage for memory-bound workloads. Consider local CPU-run LLMs as a prime example. The M-s crushes AMD there. I found the gaming benchmark interesting because it represented workloads that had workloads that just straddled the cache sizes, and thus showed how good the Zen 5 could be had it had a much better memory subsystem. I'm happy to be corrected though. |
|
|
|
| ▲ | formerly_proven 3 days ago | parent | prev [-] |
| > M-series chips have closely integrated RAM right next to the CPU, while AMD makes do with standard DDR5 far away from the CPU, which leads to a huge latency increase 2/3rds the speed of light must be very slow over there |
| |
| ▲ | torginus 3 days ago | parent [-] | | I mean at 2GHz, and 2/3c, the signal travels about 10cm in 1 clock cycle. So it's not negligible, but I suspect it has much more to do with signal integrity and the transmission line characteristics of the data bus. I think since on mobile CPUs, the RAM sits right on top of the SoC, very likely the CPUs are designed with a low RAM latency in mind. | | |
| ▲ | christkv 3 days ago | parent | next [-] | | I think the m chips have much wider databus so bandwith is much higher as well as lower latency? | | |
| ▲ | VHRanger 3 days ago | parent [-] | | huh, it seems like the M4 pro can hit >400GB/s of RAM bandwidth whereas even a 9950x hits only 100GB/s. I'm curious how that is; in practice it "feels" like my 9950x is much more efficient at "move tons of RAM" tasks like a duckDB workload above a M4. But then again a 9950x has other advantages going on like AVX512 I guess? | | |
| ▲ | hnuser123456 3 days ago | parent [-] | | Yes, the M-series chips effectively use several "channels" of RAM (depending on the tier/size of chip) while most desktop parts, including the 9950x, are dual-channel. You get 51.2 GB/s of bandwidth per channel of DDR5-6400. You can get 8-RAM-channel motherboards and CPUs and have 400 GB/s of DDR5 too, but you pay a price for the modularity and capacity over it all being integrated and soldered. DIMMs will also have worse latency than soldered chips and have a max clock speed penalty due to signal degradation at the copper contacts. A Threadripper Pro 9955WX is $1649, a WRX90 motherboard is around $1200, and 8x16GB sticks of DDR5 RDIMMS is around $1200, $2300 for 8x32GB, $3700 for 8x64GB sticks, $6000 for 8x96GB. | | |
| ▲ | christkv 2 days ago | parent [-] | | Or you can get a strix halo 395+ that has 8 memory channels with a max of 128gb of ram. I think it does around 400 GB/s | | |
| ▲ | hnuser123456 2 days ago | parent [-] | | From what I see Strix Halo has a 256 bit memory bus, which would be like quad channel ddr5, but it's soldered so can run at 8000mt/s, which comes out to 256 GB/s. | | |
| ▲ | christkv a day ago | parent [-] | | Yeah you are right still up from the other consumer platforms |
|
|
|
|
| |
| ▲ | formerly_proven 2 days ago | parent | prev [-] | | > I mean at 2GHz, and 2/3c, the signal travels about 10cm in 1 clock cycle. So it's not negligible That's 0.5ns - if you look at end-to-end memory latencies, which are usually around 100ns for mobile systems, that actually is negligible, and M series chips do not have particularly low memory latency (they trend higher in comparison). |
|
|