| ▲ | zamadatix 7 hours ago | |
Much appreciated, but I mean more around "what do the error bars in the figure represent" than what the turn scaling itself is. | ||
| ▲ | esafak 7 hours ago | parent | next [-] | |
For the tasks in SWE-Bench Pro they obtained a distribution of agent turns, summarized as the box plot. The box likely describes the inter-quartile range while the whiskers describe the some other range. You'd have to read their report to be sure. https://en.wikipedia.org/wiki/Box_plot | ||
| ▲ | jsnell 7 hours ago | parent | prev [-] | |
That's a box plot, so those are not error bars but a visualization of the distribution of a metric (min, max, median, 25th percentile, 75th percentile). The benchmark consists of a bunch of tasks. The chart shows the distribution of the number of turns taken over all those tasks. | ||