Remix.run Logo
0manrho 4 days ago

This is a legitimate problem in datacenters. They're getting to the point where a single 40(ish)OU/RU rack can pull a megawatt in some hyperdense cases. The talk of GPU/AI datacenters consuming inordinate amounts of energy isn't just because the DC's are yuge, (although some are), but because the power draw per rack unit space is going through the roof as well.

On the consumer side of things where the CPU's are branded Ryzen or Core instead of Epyc or Xeon, a significant chunk of that power consumption is from the boosting behavior they implement to pseudo-artificially[0] inflate their performance numbers. You can save huge (easily 10%, often closer to 30%, but really depends on exact build/generation) on energy by doing a very mild undervolt and limiting boosting behavior on these cpus and keeping the same base clocks. Intel 11th through 14th gen CPU's are especially guilty of this, as are most Threadripper CPU's. you can often trade single digit or even negligible performance losses (depends on what you're using it for and how much you undervolt/underclock/restrict boosting) for double digit reductions in power usage. This phenomenon is also true for GPU's when compared across the enterprise/consumer divide, but not quite to the significant extent in most cases.

Point being, yeah, it's a problem in data centers, but honestly there's a lot of headroom still even if you only have your common American 15A@120VAC outlets available before you need to call your electrician and upgrade your panel and/or install 240VAC outlets or what have you.

0: I say pseudo-artificial because the performance advantages are real, but unless you're doing some intensive/extreme cooling, they aren't sustainable or indicative of nominal performance, just a brief bit of extra headroom before your cooling solution heat-soaks and the CPU/GPU's throttle themselves back down. But it lets them put the "Bigger number means better" on the box for marketing.

Panzer04 4 days ago | parent | next [-]

It's not just about better numbers. Getting high clocks for a short period helps in a lot of use cases - say random things like a search. If I'm looking for some specific phrase in my codebase in vscode, everything spins up for the second or two it takes to process that.

Boosting from 4 to 5,5.5 ghz for that brief period shaves a fraction of a second - repeat that for any similar operation and it adds up.

0manrho a day ago | parent [-]

Yes, I figured that much would be obvious to this crowd. Thus the "pseudo" part.

The point isn't that there isn't a benefit, it's that you start to pay exponentially more energy per 0.1GHz at a certain point. Furthermore, AMD and Intel were exceptionally aggressive about it in the generations I outlined (AMD would be 7000 series ryzens specifically), leading to instability issues on both platforms due to their spec itself being too aggressive, or AIB partners improperly implementing that spec as the headroom that typically exists from factory stock to push clocks/voltages further was no longer there in some silicon (some of it comes down to silicon lottery and manufacturing defects/mistakes (Intel's oxidation issues for example) but we're really getting into the weeds on this already)

And to clarify: I'm talking specifically of Intel turboboost and AMD's PBO boosting technologies where they boost where they boost well over base clocks, separate from the general dynamic clocking behavior where clocks will drop well below base when not in (heavy) use.

latchkey 2 days ago | parent | prev | next [-]

> They're getting to the point where a single 40(ish)OU/RU rack can pull a megawatt in some hyperdense cases.

Switch is designing for 2MW racks now.

spacedcowboy 4 days ago | parent | prev | next [-]

unless it’s an Apple data center, populated by the server version of the latest ultra chips…

0manrho 3 days ago | parent | next [-]

What makes you think that?

They're small and efficient, that means they can pack large numbers of those into small spaces, resulting in a similar large power draw per volume occupied by equipment in the DC. This is especially true with Apple's "Ultrafusion" tech which they're developing as quasi-analog to Nvidia Grace (Hopper) superchips.

spacedcowboy 3 days ago | parent [-]

Because I worked on them, before retiring. Yes they’re packed in; no they still don’t draw the same levels of power.

0manrho a day ago | parent [-]

Didn't saw they draw the same, I openly acknowledge their more efficient. Said power user per rack unit is trending up. This is true of Apple DC's as well, especially with their new larger/fused chip initiatives. It's an universal industry trends especially with AI compute, and Apple is not immune.

spacedcowboy a day ago | parent [-]

Let me rephrase to: No, they (collectively) don’t draw the same levels of power. I know what amperage is drawn by each rack. It’s nowhere near as much as was drawn by the older intel-based racks.

And yes, they’re packed densely.

deafpolygon 3 days ago | parent | prev [-]

at that point, they're powered by a bicycle.

ciupicri 3 days ago | parent | prev [-]

How safe is undervolting? Can it cause stability issues?

0manrho a day ago | parent [-]

Far safer than overvolting.

Changing settings can lead to stability issues no matter which way you push it frankly. If you're don't know what you're doing/aren't comfortable with it, probably not worth it.