▲	chermi a day ago
		Ignoring energy costs(!), I'm interested in the following. Say every server generation from nvda is 25% "better at training", by whatever metric (1). Could you not theoretically wire together 1.25 + delta more of the previous generation to get the same compute? The delta accounts for latency/bandwidth from interconnects. I'm guessing delta is fairly large given my impression of how important HBM and networking are. I don't know the efficiency gains per generation, but let's just say to get the same compute with this 1.25+delta system requires 2x energy. My impression is that while energy is a substantial cost, the total cost for a training run is still dominated by the actual hardware+infrastructure. It seems like there must be some break even point where you could use older generation servers and come out ahead. Probably everyone has this figured out and consequently the resale value of previous gen chips is quite high? What's the lifespan at full load of these servers? I think I read coreweave deprecates them (somewhat controversially) over 4 years. Assuming the chips last long enough, even if they're not usable for LLM training/serving inference, can't they be reused for scientific loads? I'm not exactly old, but back in my PhD days people were building our own little GPU clusters for MD simulations. I don't think long MD simulations are the best use of compute these days, but there's many similar problems like weather modeling, high dimensional optimization problems, materials/radiation studies, and generic simulations like FEA or simply large systems of ODEs. Are these big clusters being turned into hand-me-downs for other scientific/engineering problems like above, or do they simply burn them out? What's a realistic expected lifespan for a B200? Or maybe it's as simple as they immediately turn their last gen servers over to serve inference? Lot of questions, but my main question is just how much the hardware is devalued once it becomes previous gen. Any guidance/references appreciated! Also, anyone still in the academic computing world, do people like de shaw still exist trying to run massive MD simulations or similar? Do the big national computing centers use the latest greatest big Nvidia AI servers or something a little more modest? Or maybe even they're still just massive CPU servers? While I have anyone who might know, whatever happened to that fad from 10+ years ago saying a lot of compute/algorithms would be shifting toward more memory-heavy models(2). Seems like it kind of happened in AI at least. (1) Yes I know it's complicated, especially with memory stuff. (2) I wanna say it was ibm Almaden championing the idea.
	▲	SchemaLoad a day ago \| parent [-]
		I'm not the one building out datacenters but I believe the power consumption is the reason for the devaluation. It's the same reasons we saw bitcoin miners throw all their ASICs in the bin every 6 months. At some point it becomes cheaper to buy new hardware than to keep running the old inefficient chips, when the power savings of new chips exceed the purchase price of the new hardware. These AI data centers are chewing up unimaginable amounts of power, so if nvidia releases a new chip that does the same work in half the power consumption. That whole datacenter of GPUs is massively devalued. The whole AI industry is looking like there won't be a first movers advantage, and if anything there will be a late mover advantage when you can buy the better chips and skip burning money on the old generations.