Remix.run Logo
lancebeet 2 hours ago

If benchmarks are fishy, it seems their bias would be to produce better scores than expected for proprietary models, since they have more incentives to game the benchmarks.