Remix.run Logo
almogbaku 5 hours ago

Hi @jldugger

Great points, but I think there's a domain confusion here . You're describing infra/code RCA. Kelet does an AI agent Quality RCA — the agent returns a 200 OK, but gives the wrong answer.

The signal space is different. We're working with structured LLM traces + explicit quality signals (thumbs down, edits, eval scores), not distributed system logs. Much more tractable.

Your Bayesian point actually resonates — separating good from bad sessions and looking for structural differences is close to what we do. But the hypotheses aren't "100 LLM guesses + k-means." Each one is grounded in actual session data: what the user asked, what the agent did, what came back, and what the signal was.

Curious about the dimensional analysis point — are you thinking about reducing the feature space before hypothesis generation?