| ▲ | signalbright 4 hours ago | |
You're right! It's a big issue and I don't think there's a silver bullet. We have an eval suite with code+telemetry fixtures and a golden RCA+patches and an LLM-as-a-Judge. So whenever we get feedback from our users and they're OK with it, we use their feedback to create an eval case (it's still quite manual since you have to calibrate the case). We use Superlog to observe Superlog, so I often extract cases from our own errors. The PRs get better and better, but, of course, it's sort of a continuous improvement process. | ||