Yea. I realize I ought to dig into things more to understand how to push past into 90%-95% utilization territory. Thanks for the resource to read through.

▲

mpyne 4 days ago | parent | next [-]

You absolutely do not want 90-95% utilization. At that level of utilitization random variability alone is enough to cause massive whiplash in average queue lengths.

The cycle time impact of variability of a single-server/single-queue system at 95% load is nearly 25x the impact on the same system at 75% load, and there are similar measures for other process queues.

As the other comment notes, you should really work from an assumption that 80% is max loading, just as you'd never aim to have a swap file or swap partition of exactly the amount of memory overcommit you expect.

▲

rcxdude 4 days ago | parent | next [-]

Man, if there's one idea I wish I could jam into the head of anyone running an organization, it would be queuing theory. So many people can't understand that slack is necessary to have quick turnaround.

	▲	sovietmudkipz 3 days ago \| parent [-]
		Mmmm, I remember reading this in Systems Performance Brendan Gregg. I should revisit what was written…

▲

sovietmudkipz 3 days ago | parent | prev [-]

I target 80% utilization because I’ve seen that figure multiple times. I suppose I should rephrase: I’d like to understand the constraints and systems involved that make 80% considered full utilization. There’s obviously something that limits a OS; is it tunable?

Questions I imagine a thorough multiplayer solutions engineer would be curious of, the kind of person whose trying to squeeze as much juice out of the hardware specs as possible.

▲

btschaegg 3 days ago | parent | next [-]

It might not be the OS, but just statistical inevitability. If you're talking about CPU utilization on Linux, for example, it's not all that unlikely that the number you're staring at isn't "time spent by CPU doing things" but "average CPU run queue length". "100%" then doesn't only mean the CPU gets no rest, but "there's always someone waiting for a CPU to become free". It likely pays off to understand where the load numbers in your tooling actually come from.

Even if that weren't the case, lead times for tasks will always increase with more utilization; see e.g. [1]: If you push a system from 80% to 95% utilization, you have to expect a ~4.75x increase in lead time for each task _on average_: (0.95/0.05) / (0.8/0.2)

Note that all except the term containing ρ in the formula are defined by your system/software/clientele, so you can drop them for a purely relative comparison.

[1]: https://en.wikipedia.org/wiki/Kingman%27s_formula

Edit: Or, to try to picture the issue more intuitively: If you're on a highway nearing 100% utilization, you're likely standing in a traffic jam. And if that's not (yet) strictly the case, the probabilty of a small hiccup creating one increases exponentially.

▲

mpyne 3 days ago | parent | prev [-]

> I’d like to understand the constraints and systems involved that make 80% considered full utilization. There’s obviously something that limits a OS; is it tunable?

There are OS tunables, and these tunables will have some measure of impact on the overall system performance.

But the things that make high-utilization systems so bad for cycle time are inherent aspects of a queue-based system that you cannot escape through better tuning, because the issues these cause to cycle time were not due to a lack of tuning.

If you can tune a system so that what previously would have been 95% loading is instead 82% loading that will show significant performance improvements, but you'd erase all those improvements if you just allowed the system to go back up to 95% loaded.

▲

sovietmudkipz 3 days ago | parent [-]

Hmmm makes sense. Sounds like I may have a misunderstood mental model of resource consumption. I ought to reread https://technology.riotgames.com/news/valorants-128-tick-ser... (specifically the section on “Real World Performance” where the engineer describes tuning) now that I have a better appreciation that they’re not trying to make resource utilization % higher, but instead making available more resources through tuning efforts.

	▲	mpyne 2 days ago \| parent [-]
		Yeah, a big thing is latency vs. throughput. That's a great article you link and it basically notes up front what the throughput requirements are in terms of cores per player, which then sets the budget for what the latency can be for a single player's game. Now, if you imagine for a second that they managed to get it so that the average game will just barely meet their frame time threshold, and try to optimize it so that they are running right at 99% capacity, they have put themselves in an extremely dangerous position in terms of meeting latency requirements. Any variability in hitting that frame time would cause a player to bleed over into the next player's game, reducing the amount of time the server had to process that other player's game ticks. That would percolate down the line, impacting a great many players' games just because of one tiny little delay in handling one player's game. In fact it's reasons like this that they started off with a flat 10% fudge adjustment to account for OS/scheduling/software overhead. By doing so they've in principle already baked-in a 5-8% reduction in capacity usage compared to theoretical. But you'll notice in the chart that they show from recent game sessions in 2020 that the aggregate server frame time didn't hang out at 2.34 ms (their adjusted per-server target), it actually tended to average at 2.0 ms, or about 85% of the already-lowered target. And that same chart makes clear why that is important, as there was some pretty significant variability in each day's aggregate frame times, with some play sessions even going above 2.34 ms on average. Had they been operating at exactly 2.34 ms they would definitely have needed to add more server capacity. But because they were in practice aiming at 85% usage (of a 95% usage figure), they had enough slack to absorb the variability they were seeing, and stay within their overall server expectations within ±1%. Statistical variability is a fact of life, especially when humans and/or networks are involved, and systems don't respond well to variability when they are loaded to maximum capacity, even if it seems like that would be the most cost-effective. Typically this only works where it's OK to ignore variability of time, such as in batch processing (where cost-effective throughput is more valuable than low-latency).

▲

colechristensen 4 days ago | parent | prev [-]

One way to think about it is 80% IS full utilization.

The engineering time, the risks of decreased performance, and the fragility of pushing the limit at some point become not worth the benefits of reaching some higher metric of utilization. If it's not where you are, that optimum trade off point is somewhere.