Remix.run Logo
ilusion 3 hours ago

I'm very curious to see a benchmark for this - have toyed with the idea myself but haven't put in the hard work to test these hypothesis on extracting learning signal from deep-agent traces.

funfunfunction 2 hours ago | parent [-]

There's some benchmarks in the repo for AppWorld. Looks promising