| ▲ | almogbaku 5 hours ago | |
Hi @jldugger Great points, but I think there's a domain confusion here . You're describing infra/code RCA. Kelet does an AI agent Quality RCA — the agent returns a 200 OK, but gives the wrong answer. The signal space is different. We're working with structured LLM traces + explicit quality signals (thumbs down, edits, eval scores), not distributed system logs. Much more tractable. Your Bayesian point actually resonates — separating good from bad sessions and looking for structural differences is close to what we do. But the hypotheses aren't "100 LLM guesses + k-means." Each one is grounded in actual session data: what the user asked, what the agent did, what came back, and what the signal was. Curious about the dimensional analysis point — are you thinking about reducing the feature space before hypothesis generation? | ||