| ▲ | asyncadventure 8 hours ago | |||||||
This aligns with my experience trying to automate observability tasks - AI excels at individual coding patterns but struggles with the holistic understanding needed for distributed tracing. The 29% success rate actually seems optimistic considering how OpenTelemetry requires deep context about service boundaries and business logic, not just syntactic correctness. | ||||||||
| ▲ | jakozaur 7 hours ago | parent [-] | |||||||
In this benchmark, micro-services are really small, ~300 lines, and sometimes just two of them. More realistic tasks (large codebases, more microservices) would have a lower success rate. | ||||||||
| ||||||||