Remix.run Logo
kurthr 3 days ago

What is missing even in this article is the install and expected failure rate of the dominant GB300 servers. Numbers I heard were, "~15% annual failure and it's not worth trying to swap/repair". That means in 5 years these entire installs are down more than half. Of course they can install the NEW GX500turbo servers which are 4x the compute, but 2x more power hungry. How much will that cost? What is the hyperscaler write down ~$200B/yr? Better have some income to make that up. They've got only 3 years to get there.

That still means All New data centers. They aren't being built for for this now, and so the old ones'll have to get ripped out and rebuilt (in place?) before they get the new servers. I do think they've planned the external power delivery, but not cooling or IP infra. It's a CF.

fancyfredbot 2 days ago | parent | next [-]

The article is right to focus on the end customer and not on the hyperscalars.

The hyperscalars are not the ones having trouble generating income. They have plenty of paying customers. They certainly understand capital depreciation and the need to refresh hardware. Premature hardware failure will be charged back to Nvidia who are not exactly struggling for cash either.

kurthr 2 days ago | parent [-]

You don't charge back to your supplier, and expect to get the next allocation of servers.

fancyfredbot 2 days ago | parent [-]

Not sure why you would think that. Nobody is going to pass up on a hyperscalarer because they had a 15% charge back. Margins are well over 50% and you get paid for every sale while high failure rates are a different departments problem.

kurthr a day ago | parent [-]

I don't know why you think NVIDIA needs any one of the hyperscalers, or would give new scarce supply to a customer trying to retroactively cut into their profits. You should read this.

https://semianalysis.com/2025/08/20/h100-vs-gb200-nvl72-trai...

fancyfredbot 11 hours ago | parent [-]

I'd read the semi analysis article and while it's excellent as usual I don't see anything in there which says nobody RMAs defective GPUs. Perhaps there's something behind the pay wall I'd missed?

I'm not at a hyperscalar but I've been involved with deployment of A100 and H100 GPUs and we RMA GPUs which don't work. I don't think it impacted our allocations which have always seemed fine to me, but obviously it's hard to know for sure and perhaps GB200 is different.

You are right that in theory NVDA can sell everything they produce without the hyperscalars, but strategically there are many risks with acting in that way towards deep pocketed clients. They'd have to go to another customer who is likely to be less reliable. They'd put themselves on very shaky ground legally. They'd create a much stronger incentive for a deep pocketed client to become a competitor (CF trainium, TPUs). I'd be surprised if they'd take such risks to avoid what's ultimately a small cost.

3 days ago | parent | prev [-]
[deleted]