I personally think not owning their own compute is going to be an advantage.

There is a meteor headed towards all this AI investment that I don't think has been properly accounted for and that is, what happens to all the existing hardware investments when NVidia's next architecture comes out. Blackwell (H100/H200) is the current generation. Rubin (R100, presumably R200) is the next and arrives soon. Now a lot of the investment hasn't been spent yet so will likely be spent on Rubin but at that point, what happens when the next iteration comes out and does 3-4x the compute for the same electricity input and same hardware cost?

Also, what happens when people can run way bigger models on consumer hardware in 5 years? The effective limit for useful local LLMs is currently ~31B parameter models because the RTX 5090 has 32GB of VRAM and Apple's shared memory architecture, which can keep bigger models in memory, just doesn't have the raw processing power.

Anyway, why I argue Anthropic is in a better position (than OpenAI) is that they seem to have captured a market that may well be profitable for them as a company, specifically Claude for coding. So they just haven't burnt quite as much cash as OpenAI so aren't in as deep of a hole.

While I think local models are going to improve maassively over the next few years, running them in a data center at scale is always going to be cheaper for a company. Why? Because they can amortize their costs by running 24/7 and powering them and cooling them is simply cheaper at scale when you're talking about 1000+ engineers who otherwise might only be using their hardware ~40 hours a week.

IMHO Google is in the best position here of all the US companies, even though their models aren't the best, because their data centers are ruthlessly efficient, their homegrown TPUs will eventually catch up (and thus avoid the NVidia tax) and they simply haven't bet the farm on winning AI.

▲

Schiendelman 6 hours ago | parent | next [-]

I'm generally with you on all of these ideas.

However, Google probably won't catch up. Nvidia has been winning in spite of the fact that their hardware is general purpose rather than tuned for inference.

Rubin has architectural differences I don't understand that are supposed to make inference much cheaper and faster while still retaining those other more generic capabilities. Their next generation after that is going to do even better at being fast for inference and general purpose.

Google is betting that their TPUs won't depreciate faster than the markup they have to pay to Nvidia. I don't think they will be right.

▲

Der_Einzige 4 hours ago | parent | prev [-]

Why do people who don't follow the prices of A100 talk like they know things about GPU pricing dynamics?

A100s are ~7 years old and going for more than 2 dollars an hour, significantly more expensive than even 2 years ago. This is because anything with 80gb of VRAM or more and made by Nvidia will have economically useful lifespans of like, 10 years.

I could see H100s getting 12 years.

Micheal Berry doesn't know shit about GPUs.

	▲	jmyeet 3 hours ago \| parent [-]
		So I was curious about how A100s would do running DeepSeek v4. I can't find any instances of running v4 Pro on even an 8xA100 cluster. So you need to run Flash at ~284B params. A100s don't support FP8 so you're running FP16 so you're taking a hit that way. But I see estimates of 30-50tok/s for an 8xA100 cluster. They're drawing 300-400W each so you're looking at probably 3500+ Watts, which is roughly 0.01tok/W. Now jump ahead 2 years and you seem to have a massive jump in performance [1]. The tokens/Watt goes up by at least 2 orders of magnitude. And the B100 is 3-4x that. And we're about to hit the R100 (Rubin) cliff. That's what this is going to come down. When hyperscalar DCs are getting to Gigawatt power usage, it all comes down to power efficiency. Those A100s aren't far from being sold for scrap. I've been looking into how different companies are handling depreciation for this. Amazon seems to be saying the life is 3-4 years, Google 4-5 and Meta is saying 8+, which I think is wildly optimistic. [1]: https://lambda.ai/inference-models/deepseek-ai/deepseek-v4-f...