Remix.run Logo
mchusma 3 hours ago

Everything I read seems to suggest that RAM capacity is going to grow at 20-25% a year, which just doesn't seem good enough. Even in consumer use cases, phones and laptops would benefit greatly by double the amount of RAM. And then obviously, the AI need is gigantic.

I don't see it going away. I mean, it may not grow as fast as now, but I don't see it growing away either. I get why the memory makers do not want to bankrupt themselves, but it feels like there's got to be some way to push that risk off onto model providers and other people in the ecosystem to allow us to grow ram capacity more like 50% per year.

regularfry an hour ago | parent | next [-]

The openai deal would be absorbed by two years of that. And it would be inefficient for the RAM makers in a competitive market to leave buyers unsold-to.

I don't actually know what the rate of growth before October was, I'm sure someone round here will though.

foota 2 hours ago | parent | prev | next [-]

In theory the new futures markets for chip components would help here, since it would allow DRAM suppliers to insulate themselves from that risk.

minraws 3 hours ago | parent | prev | next [-]

I mean the biggest risk is Chinese CXML benefits and capturing markets that others are leaving hanging and then being able to compete and push out the others when costs start to normalize.

As for 20-25% growth not being enough, I think it's not that far off, if we assume data center build out plans hit a wall and slow down significantly, and the AI heat starts to cool off.

I don't think 20-25% may be enough in the short term but if the AI build out stops within this year, we have a massive oversupply instead of a under supply.

blululu 2 hours ago | parent | next [-]

Looking at the history of the memory industry the biggest risk is that a firm would over produce and go bankrupt. Maybe this time is different but so far no memory chip maker has gone under because their competition increased capacity.

minraws 2 hours ago | parent [-]

I might be wrong but your second point can't be true if the first one is true.

Let me explain, imagine CXML grows massive and builds a lot of fabs, so much so that it becomes the leader in multiple segments, then the market demand cools off.

Then CXML the company that invested massively has oversupply so it undercuts every other memory company.

Aka, Samsung, SK Hynix are dead, and to protect Micron now US has 10000% tariff on the supply of memory.

Imagine. Because that has happened, if you don't play the boom and bust game someone will because the market is very large during a boom, and generally the player scaling more isn't the one with margins to protect and generally has the ability to undercut others.

Asian memory chip giants were made by under cutting European and American companies, American companies adapted by moving manufacturing to Asia, and European ones got bought for pennies or dissolved.

galangalalgol 2 hours ago | parent | prev | next [-]

Is there any indication research is being focused on reducing menory footprint of inference for frontier class models? Is the low hanging fruit already gone there?

minraws 2 hours ago | parent | next [-]

Low hanging? how low hanging are we talking, the basic stuff is gone. Largely big challenges around quantization were solved 2 years ago, and we have just been improving from there.

But can massive gains still be made? Definitely.

The entire AI hype is based on the paper Attention is all you need, and Attention is basically loading a huge matrix of all the tokens in memory, how well you can optimize this attention layer is basically how most architectures are trying to solve for performance and memory usage.

Only one with significant gains in it is DeepSeek (or so I would like to believe because others don't make their work open for folks like me not in Big AI Labs to read). Their MLA architecture reduced KV-cache memory requirements by upto 90%, ofc that's purely architectural change.

With some quantization like Turboquant from google you could push it down to ~1/3 of that. So 96% memory savings when talking about kv-cache.

But the models are close to being saturated for quantization based memory optimizations. We will have to see some architectural changes for a significant shift now.

aurareturn 2 hours ago | parent | prev [-]

If they manage to make memory more efficient, they’ll just increase the context size and/or model size.

We just haven’t reached the diminishing return of gen AI capabilities yet.

Models will get more useful if you have higher context size or higher param size. Then people will just use the models even more, leading to even more memory demand.

zx8080 3 hours ago | parent | prev [-]

What is the risk? Competition is good for consumers.

LPisGood 3 hours ago | parent [-]

The risk is to the business not the consumers

bigbadfeline an hour ago | parent [-]

There's no risk to businesses that are paying bonuses of $ 1 million, per worker, per year - like the RAM makes Samsung and SK Hynix.

They are drowning in money but they don't invest in new production in order to maintain high prices. By doing so, they form a virtual trust with monopoly control over pricing. What you call "risk" for them is our best hope, China can't enter the market soon enough.

Oops, the US government is blocking the Chinese chip industry in every way possible and thus becomes a factual member of the aforementioned anti-competitive and anti-consumer trust.

minraws 17 minutes ago | parent [-]

Micron is a US company, and US did the same against Japan in the past

DoctorOetker 2 hours ago | parent | prev [-]

According to the recent article HBM memory is 3x less efficient wafer area wise than LPDDR; but the bandwidth is more than triple.

What if its in everyone's interest to buy computers at say 1/3rd the rate and switch everything over to HBM?

the discrepancy between compute and memory has been growing for ages, perhaps a painful switch to HBM is exactly what we need?

Would you rather have 3 intermediate computers with low memory bandwidth, or wait a little longer statistically so that we can all enjoy a new computer at 1/3rd the rate but much higher bandwidth than the area ratio?

FuckButtons 2 hours ago | parent | next [-]

These are fundamentally different points in design space though, hbm doesn’t have a 10mw idle draw like lpddr does.

aurareturn 2 hours ago | parent | prev | next [-]

Can’t put HBM in smartphones and laptops. The power drain is too great.

thfuran 2 hours ago | parent | prev [-]

Not many workloads are RAM bandwidth limited. Power and latency are much more common bottlenecks, and HBM loses on both of those.

zozbot234 17 minutes ago | parent | next [-]

Multicore workloads do tend to hit RAM bandwidth limits before they hit power constraints. If you do the math, running at max frequency and core utilization would usually imply you could only access a byte or so per core clock cycle. Perhaps a mere handful of bytes for the highest-performance systems with in-package RAM.

pastel8739 an hour ago | parent | prev [-]

Isn’t memory bandwidth super relevant for AI?