| ▲ | ilusion 3 hours ago | |
I'm very curious to see a benchmark for this - have toyed with the idea myself but haven't put in the hard work to test these hypothesis on extracting learning signal from deep-agent traces. | ||
| ▲ | funfunfunction 2 hours ago | parent [-] | |
There's some benchmarks in the repo for AppWorld. Looks promising | ||