| ▲ | lokar 3 hours ago | |||||||||||||||||||||||||
Even a synthetic probe needs a few failures to trigger an alert. You should not alert on cpu, ram, etc | ||||||||||||||||||||||||||
| ▲ | hnlmorg 2 hours ago | parent [-] | |||||||||||||||||||||||||
> Even a synthetic probe needs a few failures to trigger an alert. It doesn't "need" that. That just how most people set it up because it’s an easy sane default that allows for network jitter without inexperienced engineers thinking about different conditions triggering different types of responses. If you’re measuring internal APIs from an observablity solution that’s has nodes already inside you’re network enclave, then there is a strong argument for alerting early. > You should not alert on cpu, ram, etc That’s not true to say as an absolute statement. And a generalisation it heavily depends on the system your monitoring and how it behaves under pressure. But in any case, I wasn’t suggesting CPU alerts were the end goal. I said: > these types of metrics are generally bespoke to the type of application your monitoring. Ie you’ll use metrics but those metrics will be highly specific. The CPU examples were an illustration as to what a “metric” is (it might seem obvious but not everyone is an expert) but the point was HTTP response codes aren't the only types of metrics one should be capturing and watching. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||