| ▲ | sovietmudkipz 3 days ago | |
Hmmm makes sense. Sounds like I may have a misunderstood mental model of resource consumption. I ought to reread https://technology.riotgames.com/news/valorants-128-tick-ser... (specifically the section on “Real World Performance” where the engineer describes tuning) now that I have a better appreciation that they’re not trying to make resource utilization % higher, but instead making available more resources through tuning efforts. | ||
| ▲ | mpyne 2 days ago | parent [-] | |
Yeah, a big thing is latency vs. throughput. That's a great article you link and it basically notes up front what the throughput requirements are in terms of cores per player, which then sets the budget for what the latency can be for a single player's game. Now, if you imagine for a second that they managed to get it so that the average game will just barely meet their frame time threshold, and try to optimize it so that they are running right at 99% capacity, they have put themselves in an extremely dangerous position in terms of meeting latency requirements. Any variability in hitting that frame time would cause a player to bleed over into the next player's game, reducing the amount of time the server had to process that other player's game ticks. That would percolate down the line, impacting a great many players' games just because of one tiny little delay in handling one player's game. In fact it's reasons like this that they started off with a flat 10% fudge adjustment to account for OS/scheduling/software overhead. By doing so they've in principle already baked-in a 5-8% reduction in capacity usage compared to theoretical. But you'll notice in the chart that they show from recent game sessions in 2020 that the aggregate server frame time didn't hang out at 2.34 ms (their adjusted per-server target), it actually tended to average at 2.0 ms, or about 85% of the already-lowered target. And that same chart makes clear why that is important, as there was some pretty significant variability in each day's aggregate frame times, with some play sessions even going above 2.34 ms on average. Had they been operating at exactly 2.34 ms they would definitely have needed to add more server capacity. But because they were in practice aiming at 85% usage (of a 95% usage figure), they had enough slack to absorb the variability they were seeing, and stay within their overall server expectations within ±1%. Statistical variability is a fact of life, especially when humans and/or networks are involved, and systems don't respond well to variability when they are loaded to maximum capacity, even if it seems like that would be the most cost-effective. Typically this only works where it's OK to ignore variability of time, such as in batch processing (where cost-effective throughput is more valuable than low-latency). | ||