Remix clone Hacker News

new | show | ask | jobs Github

	▲	fluoridation 2 days ago
		Why are you quoting something that contradicts you? It clearly states it's a dual channel memory architecture with 32-bit subchannels. The fact the two words are used means they mean different things. >In computer manufacture speak dual channel = 2 x 64 bit = 128 bits wide. Yes, because AMD64 has 64-bit words. You can't satisfy a 64-bit load or store with just 32 bits (unless you take twice as long, of course). That you get 4 32-bit subchannels doesn't mean you can execute 4 simultaneous independent 32-bit memory operations. A 64-bit channel capable of a full operation still needs to be assembled out of multiple 32-bit subchannels. If you install a single stick you don't get any parallelism with your memory operations; i.e. the system runs in single channel mode, the single stick fulfilling only a single request at a time.
	▲	sliken 2 days ago \| parent [-]
		AM5 is the AMD standard, it's accurate, seems rather pedantic to differentiate between 2 sub channels per dimm and saying 4 32 bit channels for a total of 128 bit. However the motherboard vendors get annoyingly hide that from you by claiming DDR4 is dual channel (2 x 64 bit which means two outstanding cache misses, one per channel) and just glossing over the difference by saying DDR5 dual channel (4 x 32 bit which means 4 outstanding cache misses). > Yes, because AMD64 has 64-bit words. It's a bit more complicate than that. First you have 3 levels of cache, the last of which triggers a cache line load, which is 64 bytes (not 64 bits). That goes to one of the 4 channels, there's a long latency for the first 64 bits. Then there's the complications of opening the row, which makes the columns available, which can speed up things if you need more than one row. But the general idea is that you get at the maximum one cache line per channel after waiting for the memory latency. So DDR4 on a 128 bit system can have 2 cache lines in flight. So 128 bytes * memory latency. On a DDR5 system you can have 4 cache lines in flight per memory latency. Sure you need the bandwidth and 32 bit channels have half the bandwidth per clock, but the trick is the memory bus spends most of it's time waiting on memory to start a transfer. So waiting 50ns then getting 32bit @ 8000 MT/sec isn't that different than waiting 50ns and getting 64 bit @ 8000MT/sec. Each 32 bit subchannel can handle a unique address, which is turned into a row/column, and a separate transfer when done. So a normal DDR5 system can look up 4 addresses in parallel, wait for the memory latency and return a cache line of 64 bytes. Even better when you have something like strix halo that actually has a 256 bit wide memory system (twice any normal tablet, laptop, or desktop), but also has 16 channels x 16 bit, so it can handle 16 cache misses in flight. I suspect this is mostly to get it's aggressive iGPU fed.