> Anecdotally I know robust tracing data can help me find problems in 5-15 minutes that would have taken hours or days with manual probing and scouring logs.
exactly. high-cardinality, wide structured events are the way.