Remix.run Logo
epolanski an hour ago

Catch-up in what exactly? Google isn't building hardware to sell, they aren't in the same market.

Also I feel you completely misunderstand that the problem isn't how fast is ONE gpu vs ONE tpu, what matters is the costs for the same output. If I can fill a datacenter at half the cost for the same output, does it matters I've used twice the TPUs and that a single Nvidia Blackwell was faster? No...

And hardware cost isn't even the biggest problem, operational costs, mostly power and cooling are another huge one.

So if you design a solution that fits your stack (designed for it) and optimize for your operational costs you're light years ahead of your competition using the more powerful solution, that costs 5 times more in hardware and twice in operational costs.

All I say is more or less true for inference economics, have no clue about training.

butvacuum an hour ago | parent [-]

Also, isn't memory a bit moot? At scale I thought that the ASICs frequently sat idle waiting for memory.

pests 9 minutes ago | parent [-]

You're doing operations on the memory once it's been transferred to gpu memory. Either shuffling it around various caches or processors or feeding it into tensor cores or other matrix operations. You don't want to be sitting idle.