Nvidia uses VRAM amount for market segmentation. They can't make a 128GB consumer card without cannibalizing their enterprise sales.

Which means Intel or AMD making an affordable high-VRAM card is win-win. If Nvidia responds in kind, Nvidia loses a ton of revenue they'd otherwise have available to outspend their smaller competitors on R&D. If they don't, they keep more of those high-margin customers but now the ones who switch to consumer cards are switching to Intel or AMD, which both makes the company who offers it money and helps grow the ecosystem that isn't tied to CUDA.

People say things like "it would require higher pin counts" but that's boring. The increase in the amount people would be willing to pay for a card with more VRAM is unambiguously more than the increase in the manufacturing cost.

It's more plausible that there could actually be global supply constraints in the manufacture of GDDR, but if that's the case then just use ordinary DDR5 and a wider bus. That's what Apple does and it's fine, and it may even cost less in pins than you save because DDR is cheaper than GDDR.

It's not clear what they're thinking by not offering this.

▲

blitzar 4 days ago | parent | next [-]

> Intel or AMD making an affordable high-VRAM card is win-win.

100% agree. CUDA is a bit of a moat, but the earlier in the hype cycle viable alternatives appear, the more likely the non CUDA ecosystem becomes viable.

> It's not clear what they're thinking by not offering this.

They either dont like making money or have a fantasy that one day soon they will be able to sell pallets of $100,000 GPUs they made for $2.50 like Nvidia can. It doesn't take a PhD and MBA to figure out that the only reason Nvidia have, what should be a short term market available to them is the failings of Intel and AMD and the VC / Innovation side to offer any competition.

It is such an obvious win-win that it would probably be worth skipping the engineering and just announcing the product, for sale by the end of the year and force everyones hand.

▲

prmoustache 3 days ago | parent | prev | next [-]

> The increase in the amount people would be willing to pay for a card with more VRAM is unambiguously more than the increase in the manufacturing cost.

I guess you already have the paper if it is that unambiguous. Would you mond sharing the data/source?

▲

AnthonyMouse 3 days ago | parent [-]

The cost of more pins is linear in the number of pins, and the pins aren't the only component of the manufacturing cost, so a card with twice as many pins will have a manufacturing cost of significantly less than twice that of a card with half as many pins.

Cards with 16GB of VRAM exist for ~$300 retail.

Cards with 80GB of VRAM cost >$15,000 and customers pay that.

A card with 80GB of VRAM could be sold for <$1500 with five times the margin of the $300 card because the manufacturing cost is less than five times as much. <$1500 is unambiguously a smaller number than >$15,000. QED.

▲

doctorpangloss 2 days ago | parent [-]

> the manufacturing cost is less than five times as much

They don’t manufacture the RAM. This isn’t complicated. They make less margin (a percentage) in your scenario. And that’s what Wall Street cares about.

	▲	AnthonyMouse 2 days ago \| parent [-]
		They don't really manufacture anything. TSMC or Samsung make the chip and Samsung, Micron or Hynix make the RAM. Even Intel's GPUs are TSMC. Also, Wall St cares about profit, not margins. If you can move a billion units with a $100 margin, they're going to like you a lot better than if you move a million units with a $1000 margin.

▲

singhrac 4 days ago | parent | prev | next [-]

This is almost true but not quite - I don't think much of the (dollar) spend on enterprise GPUs (H100, B200, etc.) would transfer if there was a 128 GB consumer card. The problem is both memory bandwidth (HBM) and networking (NVLink), which NVIDIA definitely uses to segment consumer vs enterprise hardware.

I think your argument is still true overall, though, since there are a lot of "gpu poors" (i.e. grad students) who write/invent in the CUDA ecosystem, and they often work in single card settings.

Fwiw Intel did try this with Arctic Sound / Ponte Vecchio, but it was late out the door and did not really perform (see https://chipsandcheese.com/p/intels-ponte-vecchio-chiplets-g...). It seems like they took on a lot of technical risk; hopefully some of that transfers over to a future project though Falcon Shores was cancelled. They really should should have released some of those chips even at a loss, but I don't know the cost of a tape out.

▲

AnthonyMouse 3 days ago | parent [-]

NVLink matters if you want to combine a whole bunch of GPUs, e.g. you need more VRAM than any individual GPU is available with. Many workloads exist that don't care about that or don't have working sets that large, particularly if the individual GPU actually has a lot of VRAM. If you need 128GB and you have GPUs with 40GB of VRAM then you need a fast interconnect. If you can get an individual GPU with 128GB, you don't.

There is also work being done to make this even less relevant because people are already interested in e.g. using four 16GB cards without a fast interconnect when you have a 64GB model. The simpler implementation of this is to put a quarter of the model on each card split in the order it's used and then have the performance equivalent of one card with 64GB of VRAM by only doing work on the card with that section of the data in its VRAM and then moving the (much smaller) output to the next card. A more sophisticated implementation does something similar but exploits parallelism by e.g. running four batches at once, each offset by a quarter, so that all the cards stay busy. Not all workloads can be split like this but for some of the important ones it works.

	▲	singhrac 3 days ago \| parent [-]
		I think we might just disagree about how much of the GPU spend is on small vs large model (inference or training). I think it’s something like 99.9% of spending interest is on models that don’t fit into 128 GB (remember KV cache matters too). Happy to be proven wrong!

▲

3 days ago | parent | prev [-]

[deleted]