Remix.run Logo
p1necone a day ago

I'm holding out for someone to ship a gpu with dimm slots on it.

tymscar a day ago | parent | next [-]

DDR5 is a couple of orders of magnitude slower than really good vram. That’s one big reason.

zrm 20 hours ago | parent | next [-]

DDR5 is ~8GT/s, GDDR6 is ~16GT/s, GDDR7 is ~32GT/s. It's faster but the difference isn't crazy and if the premise was to have a lot of slots then you could also have a lot of channels. 16 channels of DDR5-8200 would have slightly more memory bandwidth than RTX 4090.

tymscar 14 hours ago | parent [-]

Yeah, so DDR5 is 8GT and GDDR7 is 32GT. Bus width is 64 vs 384. That already makes the VRAM 4*6 (24) times faster.

You can add more channels, sure, but each channel makes it less and less likely for you to boot. Look at modern AM5 struggling to boot at over 6000 with more than two sticks.

So you’d have to get an insane six channels to match the bus width, at which point your only choice to be stable would be to lower the speed so much that you’re back to the same orders of magnitude difference, really.

Now we could instead solder that RAM, move it closer to the GPU and cross-link channels to reduce noise. We could also increase the speed and oh, we just invented soldered-on GDDR…

zrm 9 hours ago | parent [-]

> Bus width is 64 vs 384.

The bus width is the number of channels. They don't call them channels when they're soldered but 384 is already the equivalent of 6. The premise is that you would have more. Dual socket Epyc systems already have 24 channels (12 channels per socket). It costs money but so does 256GB of GDDR.

> Look at modern AM5 struggling to boot at over 6000 with more than two sticks.

The relevant number for this is the number of sticks per channel. With 16 channels and 64GB sticks you could have 1TB of RAM with only one stick per channel. Use CAMM2 instead of DIMMs and you get the same speed and capacity from 8 slots.

dawnerd a day ago | parent | prev | next [-]

But it would still be faster than splitting the model up on a cluster though, right? But I’ve also wondered why they haven’t just shipped gpus like cpus.

cogman10 a day ago | parent [-]

Man I'd love to have a GPU socket. But it'd be pretty hard to get a standard going that everyone would support. Look at sockets for CPUs, we barely had cross over for like 2 generations.

But boy, a standard GPU socket so you could easily BYO cooler would be nice.

estimator7292 13 hours ago | parent [-]

The problem isn't the sockets. It costs a lot to spec and build new sockets, we wouldn't swap them for no reason.

The problem is that the signals and features that the motherboard and CPU expect are different between generations. We use different sockets on different generations to prevent you plugging in incompatible CPUs.

We used to have cross-generational sockets in the 386 era because the hardware supported it. Motherboards weren't changing so you could just upgrade the CPU. But then the CPUs needed different voltages than before for performance. So we needed a new socket to not blow up your CPU with the wrong voltage.

That's where we are today. Each generation of CPU wants different voltages, power, signals, a specific chipset, etc. Within the same +-1 generation you can swap CPUs because they're electrically compatible.

To have universal CPU sockets, we'd need a universal electrical interface standard, which is too much of a moving target.

AMD would probably love to never have to tool up a new CPU socket. They don't make money on the motherboard you have to buy. But the old motherboards just can't support new CPUs. Thus, new socket.

cogman10 a day ago | parent | prev [-]

For AI, really good isn't really a requirement. If a middle ground memory module could be made, then it'd be pretty appealing.

anon25783 a day ago | parent | prev | next [-]

Would that be worth anything, though? What about the overhead of clock cycles needed for loading from and storing to RAM? Might not amount to a net benefit for performance, and it could also potentially complicate heat management I bet.

kristianp a day ago | parent | prev [-]

A single CAMM might suit better.