Remix.run Logo
danielvaughn 3 days ago

Yeah I'm nowhere near ready to loosen the leash. Show me a long-running agent that can get within 90% of its goal, then I'll be convinced. But right now we barely even have the tools to properly evaluate such agents.