| ▲ | hnlmorg 2 hours ago | ||||||||||||||||
> Even a synthetic probe needs a few failures to trigger an alert. It doesn't "need" that. That just how most people set it up because it’s an easy sane default that allows for network jitter without inexperienced engineers thinking about different conditions triggering different types of responses. If you’re measuring internal APIs from an observablity solution that’s has nodes already inside you’re network enclave, then there is a strong argument for alerting early. > You should not alert on cpu, ram, etc That’s not true to say as an absolute statement. And a generalisation it heavily depends on the system your monitoring and how it behaves under pressure. But in any case, I wasn’t suggesting CPU alerts were the end goal. I said: > these types of metrics are generally bespoke to the type of application your monitoring. Ie you’ll use metrics but those metrics will be highly specific. The CPU examples were an illustration as to what a “metric” is (it might seem obvious but not everyone is an expert) but the point was HTTP response codes aren't the only types of metrics one should be capturing and watching. | |||||||||||||||||
| ▲ | lokar 2 hours ago | parent [-] | ||||||||||||||||
Ah, yes, I misunderstood. And I have seen cases where a direct CPU alert makes sense, but 99 times out of 100 times I see it, it's nothing but trouble. Worse, I tend to see the cpu alert when there are no end to end synthetic alerts, 500 alerts, queue depth alerts, etc. If your requests are fast and cheap, you can probe frequently relative to your goals, but often that's not really possible (think, long SQL queries, or scheduling a container/pod). There you need several datapoints, or possible fewer augmented with other signals. | |||||||||||||||||
| |||||||||||||||||