Remix.run Logo
bonsai_spool 2 days ago

> On individual tasks Claude and GPT are comparable

That is not what the first graphs show - the Anthropic models cluster at 'better' positions on the graph, and I imagine you could show that the values are significantly different.