Remix.run Logo
why_at 5 hours ago

Maybe I'm being dumb, but I don't understand what the innovation is here.

I get that they're using liquid coolant at higher than usual temperatures, but why couldn't they do that before? Most of the comparison in the article is for air cooled datacenters but what about other liquid cooled ones?

Surely in all the previous datacenters that have been designed there has been someone doing the math and determining what temperature things need to run at, how much energy it will use, how much heat it all will produce, etc.

edit: just saw this:

>Previous liquid-cooled servers were hybrid: GPUs and CPUs got cold plates, but the rest of the system stayed air-cooled, with finned heat sinks designed to shed heat into moving air. In a fully liquid-cooled server, the cooling for these components needed to be completely redesigned to use liquid.

toast0 3 hours ago | parent | next [-]

> Surely in all the previous datacenters that have been designed there has been someone doing the math and determining what temperature things need to run at, how much energy it will use, how much heat it all will produce, etc.

It seemed like a pretty big deal ~ 2011 when big companies were running their (air cooled) datacenters closer to 95F (35C) vs the traditional 72F (22C). So jumping up a little more is maybe not super exciting, but it's still innovation.

mistercow 2 hours ago | parent | next [-]

And I think the answer to the "doing the math" question is, until you've actually collected the data, "what math?" Until someone actually puts a bunch of six-figure value hardware through its paces, pushes the previous limits, and sees what that does to its lifespan, there's nothing to meaningfully calculate.

gleenn 3 hours ago | parent | prev [-]

And the fact that their system doesn't dump water. I think that is actually perhaps the bigger deal. Datacenters have been getting a lot of heat (pun intended) for using significant fresh water at the expense of local municipalities.

frollogaston 3 hours ago | parent [-]

Closed-loop water cooling chips is nothing new. There are two separate water systems that often get conflated*. The loop warms up the water, which is recycled but first needs to be cooled externally somehow. Normally they use evaporative cooling towers that do use water, or chillers that don't use water but use more energy. But they're claiming they can get that water loop so much hotter than the outdoor environment that active cooling isn't needed. They attribute this to improving the chip-to-water interaction.

Even air-cooled datacenters work somewhat the same way, but instead of water to chips, it's air. The air goes into hot aisles then exchanges heat with water, after which, see above.

* Other datacenter marketing materials talk about how they have a "closed loop system that uses no water" and they do still use water in the evap towers. I was half expecting this article to be that again, glad it wasn't.

XorNot 2 hours ago | parent [-]

Just because it's not new doesn't mean that it was available or that the engineering needed to bring it to mass market wasn't significant.

frollogaston 2 hours ago | parent [-]

It was available, there are plenty of water-cooled datacenters already, or water-cooled racks fitted into existing sites. Nvidia improved the cooling efficiency though.

RachelF 4 hours ago | parent | prev | next [-]

The "innovation" is that everything is now attached to a watercooled block.

The rest is marketing: The Cray supercomputer were fluid cooled back in the 1980's, the entire board had an inert liquid flowing across it.

jasonwatkinspdx an hour ago | parent | next [-]

When my grandpa retired from Monsanto chemical back in the 90s, I helped him clean out his office and got a tour of a bunch of stuff.

He showed me their Cray, which had its own dedicated computer room, and they set it up with the coolant pump and fountain unit right in the middle in front of a glass wall facing the hallway so everyone could gawk at it.

3 hours ago | parent | prev | next [-]
[deleted]
frollogaston 3 hours ago | parent | prev | next [-]

The innovation is being able to run the chips at higher temps without ruining them too quickly.

dietr1ch 2 hours ago | parent [-]

Haven't AMD CPUs been targeting a 95°C limit for 5+ years already? I'd have guessed servers could do 60°C without degrading a whole lot before switching to more power efficient hardware is available.

frollogaston 2 hours ago | parent [-]

95˚C is the core temp, not ambient. My parent comment was probably wrong though, see https://news.ycombinator.com/item?id=48667527

fennec-posix 4 hours ago | parent | prev [-]

My partner lamented the same thing... Cray was doing this 40+ years ago

briandw 3 hours ago | parent | next [-]

Cray used Fluorinert, a chlorofluorocarbon. So not exactly a environmentally friendly solution.

trhway 3 hours ago | parent | prev [-]

Bad quality of water clogging the pipes integrated onto the PCBs (thus requiring to replace the PCBs) was said to be what were killing those few USSR Elbrus supercomputer installations.

loeg 5 hours ago | parent | prev | next [-]

You have to design your hardware to tolerate being run in consistently hotter conditions. There's a tradeoff between cooling cost and failure rate / capex.

frollogaston 2 hours ago | parent | next [-]

Doesn't look like they made the hardware more tolerant of temperature, rather they made it remove waste heat more quickly.

"NVIDIA’s thermal engineering team reworked how those components handle heat, designing cooling loops that simplify how liquid is routed to multiple high-power chips on the board using a single inlet and outlet, resulting in a cleaner tray-level cooling architecture"

AlotOfReading 3 hours ago | parent | prev [-]

Nvidia's automotive and aerospace variants get ratings up to 85C, for comparison.

taneq 3 hours ago | parent [-]

Don’t their consumer GPUs run at 85C core temp? Maybe not for as long though.

frollogaston 3 hours ago | parent | next [-]

Core temp though. Ambient temp is a different story, and also depends on air vs water. In fact the article suggests the difference is getting the water more directly onto the chips, no mention of running at a higher core temp.

NekkoDroid 3 hours ago | parent | prev | next [-]

AMD CPUs basically all boost up to 90°C as a relatively normal operating temperature as long as the power (and some other factors) allow it to. I assume AMDs and NVs GPUs do to, but I play mostly CPU bound games so I see mine just sitting at ~60°C under load.

AlotOfReading 2 hours ago | parent | prev [-]

Temperature ratings are the allowed ambient temperature. The actual silicon will inevitably operate somewhat higher, because coolers are just moving heat down a temperature gradient.

sheepscreek 2 hours ago | parent | prev | next [-]

Speculating here - “effectively” cooling the CPU and GPU materially using this technique at datacenter scale may have never been done. Those things than run hot, easily crossing 100C. So the loop is doing a lot of work to keep them stable at 55C.

The innovation may be in the speed or volume flow of the coolant through different parts of the data centre to regulate the temperature. And of course, redesigning every component to be compatible with this fan-less design.

I think it’s only possibly because NVIDIA is much more vertically integrated than ever before.

taneq 3 hours ago | parent | prev [-]

Is this not how it was already done? Huh.