Remix.run Logo
hadlock 2 hours ago

some kind of top-level metric like avg tokens/task would be useful. e.g. yes stepfun is 5% the price of sonnet, but does it use 1x, 10x or 1000x more tokens to accomplish similar tasks/median per task. for example I am willing to eat a 20% quality dive from sonnet if the token use is < 10% more than sonnet. if token use is 1000x then that's something I want to know.

skysniper 10 minutes ago | parent [-]

added https://app.uniclaw.ai/arena/model-stats

also added per battle stats in battle detail page