Remix.run Logo
doom2 a day ago

> Bottom line is that H100 prices are near 3 year highs, A100s are still profitable to run, B200 prices are increasing, no one has enough compute.

Then why aren't the hardware manufacturers of components needed by AI companies making plans yesterday to bring new fabs online to meet demand? That isn't a gotcha question, I genuinely want to know. The money involved isn't that much compared to the money changing hands between Nvidia Microsoft, OpenAI, etc., and it's not like once in-progress data center construction is complete they won't need to buy more RAM and GPUs, especially with any new advances in technology that might happen.

Inevitably someone will reply that hardware manufacturers don't want to be stuck losing money on a facility because the bubble popped and demand disappeared, but if Anthropic and OpenAI are going to "run laps around current big tech", it should be a no-brainer to increase production capacity.

jsnell a day ago | parent | next [-]

A new fab will need to be filled with advanced equipment like lithography machines. They are the most complex thing humanity has every built.

There is one supplier of EUV lithography machines in the world, ASML. They are basically acting as an integrator for hundreds of highly specialized components manufactured to unimaginable levels of precision. Each of them has roughly one eligible supplier in the world who are operating at full capacity. To expand, they'll need yet another set of specialized and almost impossible to build equipment.

So the supply chain moves incredibly slowly, and the slowness is intrinsic due to the complexity and depth of the supply chain. It can't be fixed with just money. IIRC ASML is aiming to merely double their production of EUV lithography machines by 2030.

doom2 a day ago | parent [-]

Sure, I didn't mean to suggest that it would be easy or fast to increase manufacturing capabilities, just that the confidence I'm seeing around AI should extend to the manufacturers (if that confidence for the future growth and success of OpenAI and Anthropic is warranted). That is, the business decision to increase RAM and GPU supply should be "easy".

jsnell a day ago | parent [-]

Right, but the business decisions probably aren't the constraint at this point? (But were a year ago.)

Once the ability of the supply chain to grow has been saturated, no amount of extra confidence will make it grow faster.

aurareturn a day ago | parent | prev [-]

They are. They're making as many fabs as they can as fast as they can.

The bottleneck is ASML, who can only make so many EUV machines. No one else can make EUV machines.

Scaling chip fabs and chip equipment is much harder. And you have to understand that chip fabs go bankrupt if demand suddenly drops so they have to be more cautious by default.

zozbot234 a day ago | parent [-]

If you're really compute constrained do you really need EUV machines? You can make do with DUV fabrication nodes, albeit at somewhat higher cost. The trailing edge is where a lot of the mass impactful innovation is, e.g. trying to replicate more advanced EUV nodes with DUV multiple patterning.

aurareturn 15 hours ago | parent | next [-]

That’s what’s happening. Companies who were planning a move to advanced nodes for non AI chips are delaying it. All the advanced nodes are going to AI or smartphone chips only.

senordevnyc 17 hours ago | parent | prev [-]

There was a good episode on Dwarkesh's podcast about this in the last few weeks, just a deep dive into the semiconductor industry and what the bottlenecks are.