| ▲ | smithclay 7 hours ago | |
We need more rigorous benchmarks for SRE tasks, which is much easier said that done. The only other benchmark I've come across is https://sreben.ch/ ... certainly there must be others by now? | ||
| ▲ | nyellin 4 hours ago | parent [-] | |
We publish the benchmarks for HolmesGPT (CNCF sandbox project) at https://holmesgpt.dev/development/evaluations/ | ||