Remix.run Logo
smithclay 7 hours ago

We need more rigorous benchmarks for SRE tasks, which is much easier said that done.

The only other benchmark I've come across is https://sreben.ch/ ... certainly there must be others by now?

nyellin 4 hours ago | parent [-]

We publish the benchmarks for HolmesGPT (CNCF sandbox project) at https://holmesgpt.dev/development/evaluations/