| ▲ | jpollock 4 hours ago | |
Measurement and alerting is usually done in business metrics, not the causes. That way you catch classes of problems. Not sure about expected loss, that's a decay rate? But stuck jobs are via tasks being processed and average latency. | ||