Remix.run Logo
aloknnikhil 3 hours ago

The typical issues I have seen with LLMs / Agents tend to be reactive in their fixes. So they tend to "patch" the symptom more than "fix" the root cause. Interested to see how you solve this problem.

signalbright 3 hours ago | parent [-]

You're right! It's a big issue and I don't think there's a silver bullet.

We have an eval suite with code+telemetry fixtures and a golden RCA+patches and an LLM-as-a-Judge. So whenever we get feedback from our users and they're OK with it, we use their feedback to create an eval case (it's still quite manual since you have to calibrate the case).

We use Superlog to observe Superlog, so I often extract cases from our own errors. The PRs get better and better, but, of course, it's sort of a continuous improvement process.