▲ | Show HN: TraceRoot – Open-source agentic debugging for distributed services(github.com) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
40 points by xinweihe 2 days ago | 16 comments | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hey Xinwei and Zecheng here, we are the authors of TraceRoot (https://github.com/traceroot-ai/traceroot). TraceRoot (https://traceroot.ai) is an open-source debugging platform that helps engineers fix production issues faster by combining structured traces, logs, source code contexts and discussions in Github PRs, issues and Slack channels, etc. with AI Agents. At the heart are our lightweight Python (https://github.com/traceroot-ai/traceroot-sdk) and TypeScript (https://github.com/traceroot-ai/traceroot-sdk-ts) SDKs - they can hook into your app using OpenTelemetry and captures logs and traces. These are either sent to a local Jaeger (https://www.jaegertracing.io/) + SQLite backend or to our cloud backend, where we correlate them into a single view. From there, our custom agent takes over. The agent builds a heterogeneous execution tree that merges spans, logs, and GitHub context into one internal structure. This allows it to model the control and data flow of a request across services. It then uses LLMs to reason over this tree - pruning irrelevant branches, surfacing anomalous spans, and identifying likely root causes. You can ask questions like “what caused this timeout?” or “summarize the errors in these 3 spans”, and it can trace the failure back to a specific commit, summarize the chain of events, or even propose a fix via a draft PR. We also built a debugging UI that ties everything together - you explore traces visually, pick spans of interest, and get AI-assisted insights with full context: logs, timings, metadata, and surrounding code. Unlike most tools, TraceRoot stores long-term debugging history and builds structured context for each company - something we haven’t seen many others do in this space. What’s live today: - Python and TypeScript SDKs for structured logs and traces. - AI summaries, GitHub issue generation, and PR creation. - Debugging UI that ties everything together TraceRoot is MIT licensed and easy to self-host (via Docker). We support both local mode (Jaeger + SQLite) and cloud mode. Inspired by OSS projects like PostHog and Supabase - core is free, enterprise features like agent mode multi-tenant and slack integration are paid. If you find it interesting, you can see a demo video here: https://www.youtube.com/watch?v=nb-D3LM0sJM We’d love you to try TraceRoot (https://traceroot.ai) and share any feedback. If you're interested, our code is available here: https://github.com/traceroot-ai/traceroot. If we don’t have something, let us know and we’d be happy to build it for you. We look forward to your comments! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | autorinalagist 9 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Very cool! I have a question, how are you evaluating the performance while you develop this. Do you have some golden set of examples that you evaluate against? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | sand_9999 a day ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
I can connect MCP for Datadog/NewRelic/Cloudwatch logs. Cursor or ClaudeCode would give me all that I need. Are you doing something new here? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | lmeyerov 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
I'm curious -- let's say we have claude code hooked up to MCPs for jaeger, grafana, and the usual git/gh CLIs it can use out-of-the-box, and we let claude's planner work through investigations with whatever help we give it. Would TraceRoot do anything clever wrt the AI that such as a setup wouldn't/couldn't? (I'm asking b/c we're planning a setup that's basically that, so real question.) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | thatrandybrown 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
I like the idea of this and the use case, but don't love the tight coupling to openai. I'd love to see a framework for allowing BYOM. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | jinusunil 20 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
How do you evaluate the output of your trace tool? Are some benchmarks for tracing tools? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|