Remix.run Logo
roscas 5 days ago

That is amazing. Most consumer boards will only have 32 or 64. To have 512 is great!

justincormack 5 days ago | parent | next [-]

You havent seen the price of 128GB DDR5 RDIMMs, they are maybe $1300 each.

A lot of the initial use cases of CXL seem to be to use up lots of older DDR4 RDIMMs in newer systems to expand memory, eg cloud providers have a lot.

kvemkon 5 days ago | parent [-]

Micron DDR5-5600 for 900 Euro (without VAT, business).

tanelpoder 5 days ago | parent | prev | next [-]

... and if you have the money, you can use 3 out of 4 PCIe5 slots for CXL expansion. So that could be 2TB DRAM + 1.5TB DRAM-over-CXL, all cache coherent thanks to CXL.mem.

I guess there are some use cases for this for local users, but I think the biggest wins could come from the CXL shared memory arrays in smaller clusters. So you could, for example, cache the entire build-side of a big hash join in the shared CXL memory and let all other nodes performing the join see the single shared dataset. Or build a "coherent global buffer cache" using CPU+PCI+CXL hardware, like Oracle Real Application Clusters has been doing with software+NICs for the last 30 years.

Edit: One example of the CXL shared memory pool devices is Samsung CMM-B. Still just an announcement, haven't seen it in the wild. So, CXL arrays might become something like the SAN arrays in the future - with direct loading to CPU cache (with cache coherence) and being byte-addressable.

https://semiconductor.samsung.com/news-events/tech-blog/cxl-...

cjensen 5 days ago | parent | prev [-]

Both of the supported motherboards support installation of 2TB of DRAM.

reilly3000 5 days ago | parent [-]

Presumably this is about adding more memory channels via pcie lanes. I’m very curious to know what kind of bandwidth one could expect with such a setup, as that is the primary bottleneck for inference speed.

Dylan16807 5 days ago | parent [-]

The raw speed of PCIe 5.0 x16 is 63 billion bytes per second each way. Assuming we transfer several cache lines at a time the overhead should be pretty small, so expect 50-60GB/s. Which is on par with a single high-clocked channel of DRAM.