| ▲ | signalbright 2 hours ago | |
Great question! The investigation agent has access to all the telemetry - not only one service. So we can actually trace the root cause in such complex cases! There are good ways to link operations between different services with OpenTelemetry (for example, passing the parent trace id in an inter-service HTTP/gRPC request). It's a bit tedious to do by hand, that's why we're publishing the skill that does that for you. And totally agreed on config changes and deploy info. We've seen that having good environment and version control (commit hash, file name, line number) tagging is extremely important for root cause analysis, so we go hard on this in the skills. We also have many infra integrations in our roadmap to make sure that we can deeply analyze the infra/config side of things. | ||
| ▲ | byoj an hour ago | parent [-] | |
Interesting product, but had similar question, i think it will take a little time to be mature for production systems: as what i can see right now is very straightforward, most of the observability providers are doing this, in case you already have the observability stack setup. we currently use Openobserve they have an ai agent that provides correlation, cause and fix for any issues . The real differentiator can be on how accurately you can do the investigations, and how brutally you can steelman the ability for it locate the issue, cause and fix. Good luck on the launch | ||