Remix.run Logo
esafak 7 days ago

We really need an agent benchmark to explore their ability-efficiency frontier.