Remix.run Logo
fluoridation 3 days ago

>However a standard DDR5 dimm is not 1x64 bit, it's actually 2x32 bit. Thus 2 DDR5 dimms = 4 channels.

Uh, surely that depends on how the motherboard is wired. Just because each DIMM has half the pins on one channel and the other half on another, doesn't mean 2 DIMM = 4 channels. It could just be that the top pins over all the DIMMs are on one channel and the bottom ones are on another.

sliken 2 days ago | parent [-]

I think there's a standard wiring for the dimm and some parts are shared. Each normal ddr5 dimm has 2 sub channels that are 32 bits each, and the new specification for the HUDIMM which will only enable 1 sub channel and only have half the bandwidth.

I don't think you can wire up DDR5 dimms willy nilly as if they were 2 separate 32 bit dimms.

fluoridation 2 days ago | parent [-]

Well, I don't know what to tell you. I'm not a computer engineer, but I assume Gigabyte has at least a few of those, and they're labeling the X870E boards with 4 DIMMS as "dual channel". I feel like if they were actually quad channel they'd jump at the chance to put a bigger number, so I'm compelled to trust the specs.

sliken 2 days ago | parent [-]

In computer manufacture speak dual channel = 2 x 64 bit = 128 bits wide.

So with 2 dimms or 4 you still get 128 bit wide memory. With DDR4 that means 2 channels x 64 bit each. With DDR5 that means 4 channels x 32 bit each.

Keep in mind that memory controller is in the CPU, which is where the DDR4/5 memory controller is. The motherboards job is to connect the right pins on the DIMMs to the right pins on the CPU socket. The days of a off chip memory controller/north bridge are long gone.

So if you look at an AM5 CPU it clearly states:

   * Memory Type: DDR5-only (no DDR4 compatibility).

   * Channels: 2 Channel (Dual-Channel).

   * Memory Width: 2x32-bit sub-channels (128-bit total for 2 sticks).
fluoridation 2 days ago | parent [-]

Why are you quoting something that contradicts you? It clearly states it's a dual channel memory architecture with 32-bit subchannels. The fact the two words are used means they mean different things.

>In computer manufacture speak dual channel = 2 x 64 bit = 128 bits wide.

Yes, because AMD64 has 64-bit words. You can't satisfy a 64-bit load or store with just 32 bits (unless you take twice as long, of course). That you get 4 32-bit subchannels doesn't mean you can execute 4 simultaneous independent 32-bit memory operations. A 64-bit channel capable of a full operation still needs to be assembled out of multiple 32-bit subchannels. If you install a single stick you don't get any parallelism with your memory operations; i.e. the system runs in single channel mode, the single stick fulfilling only a single request at a time.

sliken 2 days ago | parent [-]

AM5 is the AMD standard, it's accurate, seems rather pedantic to differentiate between 2 sub channels per dimm and saying 4 32 bit channels for a total of 128 bit.

However the motherboard vendors get annoyingly hide that from you by claiming DDR4 is dual channel (2 x 64 bit which means two outstanding cache misses, one per channel) and just glossing over the difference by saying DDR5 dual channel (4 x 32 bit which means 4 outstanding cache misses).

> Yes, because AMD64 has 64-bit words.

It's a bit more complicate than that. First you have 3 levels of cache, the last of which triggers a cache line load, which is 64 bytes (not 64 bits). That goes to one of the 4 channels, there's a long latency for the first 64 bits. Then there's the complications of opening the row, which makes the columns available, which can speed up things if you need more than one row. But the general idea is that you get at the maximum one cache line per channel after waiting for the memory latency.

So DDR4 on a 128 bit system can have 2 cache lines in flight. So 128 bytes * memory latency. On a DDR5 system you can have 4 cache lines in flight per memory latency. Sure you need the bandwidth and 32 bit channels have half the bandwidth per clock, but the trick is the memory bus spends most of it's time waiting on memory to start a transfer. So waiting 50ns then getting 32bit @ 8000 MT/sec isn't that different than waiting 50ns and getting 64 bit @ 8000MT/sec.

Each 32 bit subchannel can handle a unique address, which is turned into a row/column, and a separate transfer when done. So a normal DDR5 system can look up 4 addresses in parallel, wait for the memory latency and return a cache line of 64 bytes.

Even better when you have something like strix halo that actually has a 256 bit wide memory system (twice any normal tablet, laptop, or desktop), but also has 16 channels x 16 bit, so it can handle 16 cache misses in flight. I suspect this is mostly to get it's aggressive iGPU fed.