▲ | tracker1 6 days ago | |
> There is rarely a stable definition of "work". Is it RPS? Request cost can vary even throughout the day. Is it instructions? Same, the typical IPC can vary. I think this is probably one of the most important points... similarly, is this public facing work dealing with any kind of user request, or is it simply crunching numbers/data to build an AI model from a stable backlog/queue? My take has always been with modern multi-core, hyper-threaded CPUs that are burstable is to consider ~60% a "loaded" server. That should have work split if it's that way for any significant portion of a day. Mostly dealing with user-facing services. So bursts and higher traffic portions of the day are dramatically different from lower utilization portions of the day. A decade ago, this lead to a lot of work for cloud provisioning on demand for the heavier load times. Today it's a bit more complicated when you have servers with 100+ cores as an option for under $30k (guestimate based on $10k CPU price). Today, I'd lean to over-provisioning dedicated server hardware and supplement with cloud services (and/or self-cloud-like on K8s) as pragmatically as reasonable... depending on the services of course. I'm not currently in a position where I have this level of input though. Just looking at how, as an example, StackOverflow scaled in the early days is even more possible/prudent today to a much larger extent... You can go a very long way with a half/full rack and a 10gb uplink in a colo data center or two. In any case, for me... >= 65% CPU load for >= 30m/day means it's at 100% effective utilization, and needs expansion relatively soon. Just my own take. | ||
▲ | everforward 6 days ago | parent [-] | |
> In any case, for me... >= 65% CPU load for >= 30m/day means it's at 100% effective utilization, and needs expansion relatively soon. I think this depends on workload still because IO heavy apps hyperthread well and can push up to 100%. I think most of the apps I've worked on end up being IO bound because "waiting on SQL results" or the more generic "waiting on downstream results" is 90% of their runtime. They might spend more time reading those responses off the wire than they do actually processing anything. There are definitely things that isn't true of though, and your metrics read about right to me. |