An H100 is a $20k USD card and has 80GB of vRAM. Imagine a 2U rack server with $100k of these cards in it. Now imagine an entire rack of these things, plus all the other components (CPUs, RAM, passive cooling or water cooling) and you're talking $1 million per rack, not including the costs to run them or the engineers needed to maintain them. Even the "cheaper"

I don't think people realize the size of these compute units.

When the AI bubble pops is when you're likely to be able to realistically run good local models. I imagine some of these $100k servers going for $3k on eBay in 10 years, and a lot of electricians being asked to install new 240v connectors in makeshift server rooms or garages.

▲

semi-extrinsic 7 days ago | parent | next [-]

What do you mean 10 years?

You can pick up a DGX-1 on Ebay right now for less than $10k. 256 GB vRAM (HBM2 nonetheless), NVLink capability, 512 GB RAM, 40 CPU cores, 8 TB SSD, 100 Gbit HBAs. Equivalent non-Nvidia branded machines are around $6k.

They are heavy, noisy like you would not believe, and a single one just about maxes out a 16A 240V circuit. Which also means it produces 13 000 BTU/hr of waste heat.

▲

kj4ips 7 days ago | parent | next [-]

Fair warning: the BMCs on those suck so bad, and the firmware bundles are painful, since you need a working nvidia-specific container runtime to apply them, which you might not be able to get up and running because of a firmware bug causing almost all the ram to be presented as nonvolatile.

▲

iJohnDoe 7 days ago | parent [-]

Are there better paths you would suggest? Any hardware people have reported better luck with?

	▲	kj4ips 7 days ago \| parent [-]
		Honestly, unless you //really// need nvlink/ib (meaning that copies and pcie trips are your bottleneck), you may do better with whatever commodity system with sufficient lanes, slots, and CFM is available at a good price.

▲

ksherlock 7 days ago | parent | prev | next [-]

It's not waste heat if you only run it in the winter.

▲

hdgvhicv 7 days ago | parent | next [-]

Opt if you ignore that both gas furnaces and heat pumps are more efficient than resistive loads.

▲

tgma 7 days ago | parent | next [-]

Heat pump sure, but how is gas furnace more efficient than resistive load inside the house? Do you mean more economical rather than more efficient (due to gas being much cheaper/unit of energy)?

▲

meatmanek 7 days ago | parent | next [-]

Depends where your electricity comes from. If you're burning fossil fuels to make electricity, that's only about 40% efficient, so you need to burn 2.5x as much fuel to get the same amount of heat into the house.

▲

tgma 7 days ago | parent | next [-]

Sure. That has nothing to do with the efficiency of your system though. As far as you are concerned this is about your electricity consumption for the home server vs gas consumption. In that sense resistive heat inside the home is 100% efficient compared to gas furnace; the fuel cost might be lower on the latter.

▲

mlyle 7 days ago | parent [-]

Sure, it's "equally efficient" if you ignore the inefficient thing that is done outside where you draw the system box, directly in proportion to how much you do it.

Heating my house with a giant diesel-powered radiant heater from across the street is infinitely efficient, too, since I use no power in my house.

▲

tgma 7 days ago | parent [-]

If you don’t close the box of the system at some point to isolate the input, efficiency would be meaningless. I think in the context of the original post, suggesting running a server in winter would be a zero-waste endeavor if you need the heat anyway, it is perfectly clear that the input is electricity to your home at a certain $/kWh and gas at a certain $/BTU. Under that premise, it is fair to say that would not be true if you have a heat pump deployed but would be true compared to gas furnace in terms of efficiency (energy consumed for unit of heat), although not necessarily true economically.

	▲	hdgvhicv 7 days ago \| parent \| next [-]
		Generating 1kWh of heat with electric/resistive is more expensive than gas, which itself is more expensive than a heat pump, based on the cost of fuel to go in If your grid is fossil fuels burning the fuel directly is more efficient. In all cases a heat pump is more efficient.
	▲	mlyle 6 days ago \| parent \| prev [-]
		I think this is pretty silly either way. - There's an upstream loss on electricity directly in proportion to how much you use; ignoring this tilts the analysis in favor of electricity. - You pay more for heat from electricity than gas, in part because of this loss.

▲

devmor 7 days ago | parent | prev [-]

It’d be fun to actually calculate this efficiency. My local power is mostly nuclear so I wonder how that works out.

▲

fulafel 7 days ago | parent | prev [-]

You accelerate the climate catastrophe so there's less need for heating in the long run.

▲

Tade0 7 days ago | parent | prev [-]

I'm in the market for an oven right now and 230V/16A is the voltage/current the one I'll probably be getting operates under.

At 90°C you can do sous vide, so basically use that waste heat entirely.

For such temperatures you'd need a CO2 heat pump, which is still expensive. I don't know about gas, as I don't even have a line to my place.

▲

_zoltan_ 7 days ago | parent | next [-]

90C for sous vide??? You're going to kill any meal at 90.

	▲	Tade0 6 days ago \| parent [-]
		Make it "up to 90°C". 5th quarter meats are better done in the higher end of sous vide temperatures. Point being, you can throttle your equipment to the desired temperature and use that energy effectively.

▲

mewpmewp2 7 days ago | parent | prev [-]

How can you bear to eat sous vide though? I've tried it for months and years, and I still find it troublesome. So mushy, nothing enjoy.

	▲	SAI_Peregrinus 7 days ago \| parent \| next [-]
		Did you skip searing it after sous vide? Did you sous vide it to the "instantly kill all bacteria" temperature (145°F for steak) thereby overcooking & destroying it, or did you sous vide to a lower temperature (at most 125°F) so that it'd reach a medium-rare 130°F-140°F after searing & carryover cooking during resting? It should have a nice seared crust, and the inside absolutely shouldn't be mushy.
	▲	brookst 7 days ago \| parent \| prev [-]
		Please research this. Done right, sous vide is amazing. But it is almost never the only technique used. Just like when you slow roast a prime rib at 200f, you MUST sear to get Maillard reaction and a satisfying texture.

▲

energy123 7 days ago | parent | prev [-]

Seasonality in git commit frequency

▲

eulgro 7 days ago | parent | prev | next [-]

> 13 000 BTU/hr

In sane units: 3.8 kW

▲

andy99 7 days ago | parent | next [-]

You mean 1.083 tons of refrigeration

▲

Skunkleton 7 days ago | parent | prev | next [-]

> In sane units: 3.8 kW

5.1 Horsepower

▲

amy214 7 days ago | parent | next [-]

> > In sane units: 3.8 kW

> 5.1 Horsepower

0-60 in 1.8 seconds

	▲	oblio 7 days ago \| parent [-]
		Again, in sane units: 0-100 in 1.92 seconds

▲

_kb 7 days ago | parent | prev | next [-]

3.8850 poncelet

▲

ta12653421 7 days ago | parent | prev [-]

But ... can it run Crysis?

	▲	UnnoTed 6 days ago \| parent [-]
		It makes you run into a crysis

▲

markdown 7 days ago | parent | prev | next [-]

How many football fields of power?

▲

semi-extrinsic 7 days ago | parent | prev [-]

The choice of BTU/hr was firmly tongue in cheek for our American friends.

▲

quickthrowman 7 days ago | parent | prev | next [-]

You’ll need (2) 240V 20A 2P breakers, one for the server and one for the 1-ton mini-split to remove the heat ;)

▲

Dylan16807 7 days ago | parent | next [-]

Matching AC would only need 1/4 the power, right? If you don't already have a method to remove heat.

▲

quickthrowman 7 days ago | parent [-]

Cooling BTUs already take the coefficient of performance of the vapor-compression cycle into account. 4w of heat removed for each 1w of input power is around the max COP for an air cooled condenser, but adding an evaporative cooling tower can raise that up to ~7.

I just looked at a spec sheet for a 230V single-phase 12k BTU mini-split and the minimum circuit ampacity was 3A for the air handler and 12A for the condenser, add those together for 15A, divide by .8 is 18.75A, next size up is 20A. Minimum circuit ampacity is a formula that is (roughly) the sum of the full load amps of the motor(s) inside the piece of equipment times 1.25 to determine the conductor size required to power the equipment.

So the condensing unit likely draws ~9.5-10A max and the air handler around ~2.4A, and both will have variable speed motors that would probably only need about half of that to remove 12k BTU of heat, so ~5-6A or thereabouts should do it, which is around 1/3rd of the 16A server, or a COP of 3.

▲

Dylan16807 7 days ago | parent [-]

Well I don't know why that unit wants so many amps. The first 12k BTU window unit I looked at on amazon uses 12A at 115V.

▲

quickthrowman 5 days ago | parent [-]

That is probably just bad data entry at Amazon. I don’t ever trust the specification data on Amazon, I look for the manufacturer’s spec sheet/cutsheet.

In this case, 12A is the maximum continuous load allowed on a 15A breaker. The unit itself probably uses between 900-1000w (7.5A to 8.3A), the spec sheet might say 12A to encourage a dedicated circuit for the A/C unit which then gets added to Amazon’s specs on their website.

▲

Dylan16807 5 days ago | parent [-]

I think I finally found an actual product page: https://bdachelp.zendesk.com/hc/en-us/articles/2319602600002...

The amazon page specifically said 1354 watts, but I think that's actually for the 14300BTU model. 12000BTU is 9.72 amps.

Anyway, doesn't this make my actual argument stronger? These units fit even better into a normal circuit than I thought, and make the mini-split look even worse in comparison.

▲

quickthrowman 4 days ago | parent [-]

4.5-5A at 240V = 9.72A at 120V

It’s the same level of power consumption. I’m not even sure what you’re asking at this point, to be honest.

▲

Dylan16807 4 days ago | parent [-]

You were talking about needing a second 240V 20A circuit, and you later backed that up by citing the spec sheet of 230V mini-split with a minimum circuit rating of 15A.

My argument was that you do not need such a circuit.

▲

quickthrowman 4 days ago | parent [-]

Technically you’re correct, a 12000 BTU minisplit only uses around 1000 watts while running which is just over 4A.

The breaker size being 20A 2P is a consequence of the NEC requiring you to size the wire based off the equipment nameplate rating of 15A, which is based off the full load amps of the motors inside the equipment.

Full load amps is the max amount of current a motor can draw at a specific voltage and is used for sizing wire and overcurrent protection for a piece of equipment. It doesn’t always match up the current a motor draws while it’s running normally. You take full load amps times 1.25 to get minimum circuit ampacity, which you use to size the conductors.

So while you are correct that a 240V 12000 BTU minisplit wont draw anywhere near 20A, the specific minisplit I looked at required a 20A breaker due to the minimum circuit ampacity being 15A. If the MCA was 12A, you could use a 15A breaker; an MCA of 8A would allow using a 10A breaker, and so on.

If you use fuses, you can size the overcurrent protection at 100%, breakers require 125% of the load for a continuous load. So you could use a 30A fusible disconnect switch fused at 15A for a unit with an MCA of 15A.

	▲	Dylan16807 4 days ago \| parent [-]
		That's not the angle I'm taking. I'm not saying anything about what the mini-split actually uses. Give it the circuit that the nameplate asks for. Instead I'm saying that particular minisplit is a lazy design and we can get a 12000 or higher BTU unit with a much smaller nameplate rating. Not only will it only need a single-pole breaker, the required circuit probably already exists.

▲

Scoundreller 7 days ago | parent | prev | next [-]

Just air freight them from 60 degrees North to 60 degrees South and vice verse every 6 months.

▲

kelnos 7 days ago | parent | prev [-]

Well, get a heat pump with a good COP of 3 or more, and you won't need quite as much power ;)

	▲	7 days ago \| parent [-]
		[deleted]

▲

xtiansimon 6 days ago | parent | prev | next [-]

> “They are heavy, noisy like you would not believe, … produces … waste heat.”

Haha. I bought a 20 yro IBM server off eBay for a song. It was fun for a minute. Soon became a doorstop and I sold it as pickup-only on eBay for $20. Beast. Never again have one in my home.

	▲	yencabulator 6 days ago \| parent \| next [-]
		That's about the era my company was an IBM reseller. Once I was kneeling behind 8x1U starting up and all the fans went to max speed for 3 seconds. Never put rackmount hardware in a room that is near anything living.
	▲	guenthert 6 days ago \| parent \| prev [-]
		Get an AS400. Those were actually expected to be installed in an office, rather than a server room. Might still be perceived as loud at home, but won't be deafening and probably not louder than some gaming rigs.

▲

CamperBob2 7 days ago | parent | prev | next [-]

Are you talking about the guy in Temecula running two different auctions with some of the same photos (356878140643 and 357146508609, both showing a missing heat sink?) Interesting, but seems sketchy.

How useful is this Tesla-era hardware on current workloads? If you tried to run the full DeepSeek R1 model on it at (say) 4-bit quantization, any idea what kind of TTFT and TPS figures might be expected?

▲

oceanplexian 7 days ago | parent | next [-]

I can’t speak to the Tesla stuff but I run an Epyc 7713 with a single 3090 and creatively splitting the model between GPU/8 channels of DDR4 I can do about 9 tokens per second on a q4 quant.

	▲	CamperBob2 7 days ago \| parent [-]
		Impressive. Is that a distillation, or the real thing?

▲

justincormack 6 days ago | parent | prev [-]

Tesla doesnt support 4 bit float.

▲

nulltype 2 days ago | parent | prev [-]

> What do you mean 10 years?

Didn’t the DGX-1 come out 9 years ago?

▲

invaliduser 7 days ago | parent | prev | next [-]

Even is the AI bubble does not pops, your prediction about those servers being available on ebay in 10 years will likely be true, because some datacenters will simply upgrade their hardware and resell their old ones to third parties.

▲

potatolicious 7 days ago | parent | next [-]

Would anybody buy the hardware though?

Sure, datacenters will get rid of the hardware - but only because it's no longer commercially profitable run them, presumably because compute demands have eclipsed their abilities.

It's kind of like buying a used GeForce 980Ti in 2025. Would anyone buy them and run them besides out of nostalgia or curiosity? Just the power draw makes them uneconomical to run.

Much more likely every single H100 that exists today becomes e-waste in a few years. If you have need for H100-level compute you'd be able to buy it in the form of new hardware for way less money and consuming way less power.

For example if you actually wanted 980Ti-level compute in a desktop today you can just buy a RTX5050, which is ~50% faster, consumes half the power, and can be had for $250 brand new. Oh, and is well-supported by modern software stacks.

▲

CBarkleyU 7 days ago | parent | next [-]

Off topic, but I bought my (still in active use) 980ti literally 9 years ago for that price. I know, I know, inflation and stuff, but I really expected more than 50% bang for my buck after 9 whole years…

▲

nucleardog 7 days ago | parent | prev | next [-]

> Sure, datacenters will get rid of the hardware - but only because it's no longer commercially profitable run them, presumably because compute demands have eclipsed their abilities.

I think the existence of a pretty large secondary market for enterprise servers and such kind of shows that this won't be the case.

Sure, if you're AWS and what you're selling _is_ raw compute, then couple generation old hardware may not be sufficiently profitable for you anymore... but there are a lot of other places that hardware could be applied to with different requirements or higher margins where it may still be.

Even if they're only running models a generation or two out of date, there are a lot of use cases today, with today's models, that will continue to work fine going forward.

And that's assuming it doesn't get replaced for some other reason that only applies when you're trying to sell compute at scale. A small uptick in the failure rate may make a big dent at OpenAI but not for a company that's only running 8 cards in a rack somewhere and has a few spares on hand. A small increase in energy efficiency might offset the capital outlay to upgrade at OpenAI, but not for the company that's only running 8 cards.

I think there's still plenty of room in the market in places where running inference "at cost" would be profitable that are largely untapped right now because we haven't had a bunch of this hardware hit the market at a lower cost yet.

	▲	7 days ago \| parent [-]
		[deleted]

▲

nullc 7 days ago | parent | prev | next [-]

I have around a thousand broadwell cores in 4 socket systems that I got for ~nothing from these sorts of sources... pretty useful. (I mean, I guess literally nothing since I extracted the storage backplanes and sold them for more than the systems cost me). I try to run tasks in low power costs hours on zen3/4 unless it's gonna take weeks just running on those, and if it will I crank up the rest of the cores.

And 40 P40 GPUs that cost very little, which are a bit slow but with 24gb per gpu they're pretty useful for memory bandwidth bound tasks (and not horribly noncompetitive in terms of watts per TB/s).

Given highly variable time of day power it's also pretty useful to just get 2x the computing power (at low cost) and just run it during the low power cost periods.

So I think datacenter scrap is pretty useful.

▲

mindslight 7 days ago | parent | prev | next [-]

It's interesting to think about scenarios where that hardware would get used only part of the time, like say when the sun is shining and/or when dwelling heat is needed. The biggest sticking point would seem to be all of the capex for connecting them to do something useful. It's a shame that PLX switch chips are so expensive.

▲

airhangerf15 7 days ago | parent | prev [-]

The 5050 doesn't support 32-bit PsyX. So a bunch of games would be missing a ton of stuff. You'd still need the 980 running with it for older PhyX games because nVidia.

▲

belter 7 days ago | parent | prev | next [-]

Except their insane electricity demands will still be the same, meaning nobody will buy them. You have plenty of SPARC servers on Ebay.

▲

cicloid 7 days ago | parent [-]

There is also a community of users known for not making sane financial decisions and keeping older technologies working in their basements.

	▲	dijit 7 days ago \| parent [-]
		But we are few, and fewer still who will go for high power consumption devices with esoteric cooling requirements that generate a lot of noise.

▲

DecentShoes 7 days ago | parent | prev | next [-]

This seems likely. Blizzard even sold off old World of Warcraft servers. You can still get them on ebay

▲

mattmanser 7 days ago | parent | prev [-]

Someone's take on AI was that we're collectively investing billions in data centers that will be utterly worthless in 10 years.

Unlike the investments in railways or telephone cables or roads or any other sort of architecture, this investment has a very short lifespan.

Their point was that whatever your take on AI, the present investment in data centres is a ridiculous waste and will always end up as a huge net loss compared to most other investments our societies could spend it on.

Maybe we'll invent AGI and he'll be proven wrong as they'll pay back themselves many times over, but I suspect they'll ultimately be proved right and it'll all end up as land fill.

▲

toast0 7 days ago | parent | next [-]

The servers may well be worthless (or at least worth a lot less), but that's pretty much true for a long time. Not many people want to run on 10 year old servers (although I pay $30/month for a dedicated server that's dual Xeon L5640 or something like that, which is about 15 years old).

The servers will be replaced, the networking equipment will be replaced. The building will still be useful, the fiber that was pulled to internet exchanges/etc will still be useful, the wiring to the electric utility will still be useful (although I've certainly heard stories of datacenters where much of the floor space is unusable, because power density of racks has increased and the power distribution is maxed out)

▲

hattmall 7 days ago | parent [-]

I have a server in my office that's at from 2009 still far more economical to run than buying any sort of cloud compute. By at least an order of magnitude.

▲

alexandre_m 7 days ago | parent [-]

Perhaps if you only need to run some old PHP app.

What kind of disk and how much memory is in there?

	▲	hattmall 5 days ago \| parent [-]
		72 Gigs of Ram, 4x SCSI 15K drives I think. Yeah, I mean it's not doing anything crazy running a lot of virtual machines, random servers, probably the most intense thing is video transcoding. It works well though and like I said way way cheaper than running the same stuff on cloud infrastructure. I think I bought it for like $500 about 10 years ago. I started saving about $76 a month just off of moving Virtual Desktops off of AWS to that when I got it so easily paid for itself in a year.

▲

bespokedevelopr 7 days ago | parent | prev | next [-]

If it is all a waste and a bubble, I wonder what the long term impact will be of the infrastructure upgrades around these dcs. A lot of new HV wires and substations are being built out. Cities are expanding around clusters of dcs. Are they setting themselves up for a new rust belt?

	▲	abeyer 7 days ago \| parent \| next [-]
		Or early provisioning for massively expanded electric transit and EV charging infrastructure, perhaps.
	▲	thenthenthen 6 days ago \| parent \| prev \| next [-]
		There are a lot of examples of former industrial sites (rust belts) that are now redeveloped into data center sites because the infra is already partly there and the environment might be beneficial, politically, environmentally/geographically. For example many old industrial sites relied on water for cooling and transportation. This water can now be used to cool data centers. I think you are onto something though, if you depart from the history of these places and extrapolate into the future.
	▲	hirvi74 7 days ago \| parent \| prev [-]
		Maybe the dcs could be turned into some mean cloud gaming servers?

▲

dortlick 7 days ago | parent | prev | next [-]

Sure, but what about the collective investment in smartphones, digital cameras, laptops, even cars. Not much modern technology is useful and practical after 10 years, let alone 20. AI is probably moving a little faster than normal, but technology depreciation is not limited to AI.

▲

gscott 7 days ago | parent | prev | next [-]

If a coal powered electric plant it next to the data-center you might be able to get electric cheap enough to keep it going.

Datacenters could go into the business of making personal PC's or workstations using the older NVIDIA cards and sell them.

▲

jonplackett 7 days ago | parent | prev | next [-]

They probably are right, but a counter argument could be how people thought going to the moon was pointless and insanely expensive, but the technology to put stuff in space and have GPS and comms satellites probably paid that back 100x

▲

vl 7 days ago | parent | next [-]

Reality is that we don’t know how much of a trope this statement is.

I think we would get all this technology without going to the moon or Space Shuttle program. GPS, for example, was developed for military applications initially.

▲

DaiPlusPlus 7 days ago | parent | prev | next [-]

I don’t mean to invalidate your point (about genuine value arising from innovations originating from the Apollo program), but GPS and comms satellites (and heck, the Internet) are all products of nuclear weapons programs rather than civilian space exploration programs (ditto the Space Shuttle, and I could go on…).

	▲	CamperBob2 7 days ago \| parent [-]
		Yes, and no. The people working on GPS paid very close attention to the papers from JPL researchers describing their timing and ranging techniques for both Apollo and deep-space probes. There was more cross-pollination than meets the eye.

▲

somenameforme 7 days ago | parent | prev [-]

It's not that going to the Moon was pointless, but stopping after we'd done little more than planted a flag was. Werner von Braun was the head architect of the Apollo Program and the Moon was intended as little more than a stepping stone towards setting up a permanent colony on Mars. Incidentally this is also the technical and ideological foundation of what would become the Space Shuttle and ISS, which were both also supposed to be little more than small scale tools on this mission, as opposed to ends in and of themselves.

Imagine if Columbus verified that the New World existed, planted a flag, came back - and then everything was cancelled. Or similarly for literally any colonization effort ever. That was the one downside of the space race - what we did was completely nonsensical, and made sense only because of the context of it being a 'race' and politicians having no greater vision than beyond the tip of their nose.

▲

jonplackett 6 days ago | parent [-]

I’ve been enjoying that Apple TV show with alternative history as if we’d kept going. It’s kinda dumb in parts but still fun to imagine!

	▲	somenameforme 5 days ago \| parent [-]
		For All Mankind. I tried getting into that, but the identity politics stuff (at least in first season) was way too intense for me. I'm not averse to it at all in practice (Deep Space Nine is one of my favorite series of all time) but, for me, it went way beyond the line from advocacy to preachiness.

▲

pbh101 7 days ago | parent | prev | next [-]

This isn’t my original take but if it results in more power buildout, especially restarting nuclear in the US, that’s an investment that would have staying power.

▲

mensetmanusman 7 days ago | parent | prev [-]

Utterly? Moores law per power requirement is dead, lower power units can run electric heating for small towns!

▲

torginus 7 days ago | parent | prev | next [-]

My personal sneaking suspicion is that publicly offered models are using way less compute than thought. In modern mixture of experts models, you can do top-k sampling, where only some experts are evaluated, meaning even SOTA models aren't using much more compute than a 70-80b non-MoE model.

▲

ActorNightly 7 days ago | parent | prev | next [-]

To piggyback on this, at enterprise level in modern age, the question is really not about "how are we going to serve all these users", it comes down to the fact that investors believe that eventually they will see a return on investment, and then pay whatever is needed to get the infra.

Even if you didn't have optimizations involved in terms of job scheduling, they would just build as many warehouses as necessary filled with as many racks as necessary to serve the required user base.

▲

brikym 7 days ago | parent | prev | next [-]

As a non-American the 240V thing made me laugh.

	▲	thecommakozzi 6 days ago \| parent [-]
		[dead]

▲

eitally 7 days ago | parent | prev | next [-]

What I wonder is what this means for Coreweave, Lambda and the rest, who are essentially just renting out fleets of racks like this. Does it ultimately result in acquisition by a larger player? Severe loss of demand? Can they even sell enough to cover the capex costs?

	▲	cootsnuck 7 days ago \| parent \| next [-]
		It means they're likely going to be left holding a very expensive bag.
	▲	adw 7 days ago \| parent \| prev [-]
		These are also depreciating assets.

▲

7 days ago | parent | prev | next [-]

[deleted]

▲

torginus 7 days ago | parent | prev | next [-]

I wonder if it's feasible to hook up NAND flash with a high bandwidth link necessary for inference.

Each of these NAND chips hundreds of dies of flash stacked inside, and they are hooked up to the same data line, so just 1 of them can talk at the same time, and they still achieve >1GB/s bandwidth. If you could hook them up in parallel, you could have 100s of GBs of bandwidth per chip.

▲

potatolicious 7 days ago | parent [-]

NAND is very, very slow relative to RAM, so you'd pay a huge performance penalty there. But maybe more importantly my impression is that memory contents mutate pretty heavily during inference (you're not just storing the fixed weights), so I'd be pretty concerned about NAND wear. Mutating a single bit on a NAND chip a million times over just results in a large pile of dead NAND chips.

▲

torginus 7 days ago | parent [-]

No it's not slow - a single NAND chip in SSDs offers >1GB of bandwidth - inside the chip there are 100+ wafers actually holding the data, but in SSDs only one of them is active when reading/writing.

You could probably make special NAND chips where all of them can be active at the same time, which means you could get 100GB+ bandwidth out of a single chip.

This would be useless for data storage scenarios, but very useful when you have huge amounts of static data you need to read quickly.

▲

slickytail 7 days ago | parent [-]

The memory bandwidth on an H100 is 3TB/s, for reference. This number is the limiting factor in the size of modern LLMs. 100GB/s isn't even in the realm of viability.

	▲	torginus 7 days ago \| parent \| next [-]
		That bandwidth is for the whole GPU, which has 6 mermoy chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap. And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.
	▲	torginus 7 days ago \| parent \| prev [-]
		That bandwidth is for the whole GPU, which has 6 chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap. And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.

▲

RagnarD 6 days ago | parent | prev | next [-]

An RTX 6000 Pro (NVIDIA Blackwell GPU) has 96GB of VRAM and can be had for around $7700 currently (at least, the lowest price I've found.) It plugs into standard PC motherboard PCIe slots. The Max Q edition has slightly less performance but a max TDP of only 300W.

▲

dboreham 7 days ago | parent | prev | next [-]

They'll be in landfill in 10 years.

▲

neko_ranger 7 days ago | parent | prev | next [-]

Four H100 in a 2U rack didn't sound impressive, but that is accurate:

>A typical 1U or 2U server can accommodate 2-4 H100 PCIe GPUs, depending on the chassis design.

>In a 42U rack with 20x 2U servers (allowing space for switches and PDU), you could fit approximately 40-80 H100 PCIe GPUs.

▲

michaelt 7 days ago | parent | next [-]

Why stop at 80 H100s for a mere 6.4 terabytes of GPU memory?

Supermicro will sell you a full rack loaded with servers [1] providing 13.4 TB of GPU memory.

And with 132kW of power output, you can heat an olympic-sized swimming pool by 1°C every day with that rack alone. That's almost as much power consumption as 10 mid-sized cars cruising at 50 mph.

[1] https://www.supermicro.com/en/products/system/gpu/48u/srs-gb...

	▲	procaryote 7 days ago \| parent \| next [-]
		> as much power consumption as 10 mid-sized cars cruising at 50 mph Imperial units are so weird
	▲	handfuloflight 7 days ago \| parent \| prev [-]
		What about https://www.cerebras.ai/system?

▲

jzymbaluk 7 days ago | parent | prev [-]

And the big hyperscaler cloud providers are building city-block sized data centers stuffed to the gills with these racks as far as the eye can see

▲

tootie 7 days ago | parent | prev | next [-]

Yeah I think the crux of the issue is that chatgpt is serving a huge number of users including paid users and is still operating at a massive operating loss. They are spending truckloads of money on GPUs and selling access at a loss.

▲

scarface_74 7 days ago | parent | prev | next [-]

This isn’t like how Google was able to buy up dark fiber cheaply and use it.

From what I understand, this hardware has a high failure rate over the long term especially because of the heat they generate.

▲

shusaku 7 days ago | parent | prev | next [-]

> When the AI bubble pops is when you're likely to be able to realistically run good local models.

After years of “AI is a bubble, and will pop when everyone realizes they’re useless plagiarism parrots” it’s nice to move to the “AI is a bubble, and will pop when it becomes completely open and democratized” phase

	▲	cootsnuck 7 days ago \| parent [-]
		It's not even been 3 years. Give it time. The entire boom and bust of the dot come bubble took 7 years.

▲

wakamana 7 days ago | parent | prev [-]

[dead]