Remix.run Logo
ryao 4 days ago

AMD CPUs tend to have more memory bandwidth than Intel CPUs and inference is CPU bound, so their claim seems accurate to me.

Whether the core does a 512-bit write in 1 cycle or 2 because it is two 256-bit writes is immaterial. Memory bandwidth is bottlenecked by 64GB/sec per CCX. You need to use cores from multiple CCXs to get full bandwidth.

That said, the EYPC 9175F has 614.4GB/sec memory bandwidth and should be able to use all of it. I have one, although the machine is not yet assembled (Supermicro took 7 weeks to send me a motherboard, which delayed assembly), so I have no confirmed that it can use all of it yet.

ryao 4 days ago | parent | next [-]

> inference is CPU bound

This was a typo. It should have been “inference is memory bandwidth bound”.

menaerus 4 days ago | parent | prev | next [-]

Interesting design. 16 CCDs / 16 CCXs / 16 cores. 1 core per each CCD. 1 CCX per each CCD. With 512MB of L3 cache this CPU should be able to use ~all of its ~10 TB/s of L3 MBW out of the box.

How much is it going to cost you to build the box?

adgjlsfhk1 4 days ago | parent | prev [-]

you can use higher write bandwidth than the CCX bandwidth by having multiple writes that go to the same L2 address before going out to RAM