Remix.run Logo
balefulboy 5 hours ago

METR's time horizon is not a reliable metric of LLM capability growth: https://www.transformernews.ai/p/against-the-metr-graph-codi...