Remix.run Logo
behnamoh 4 days ago

it has become progressively easier to game benchmarks in order to appear higher in rankings. I’ve seen several models that claimed they were the best in software engineering only to be disappointed by them not figuring out the most basic coding problems. In comparison, I’ve seen models that don’t have much hype, but are rock solid.

When people say AI has hit a wall, they mainly talk about OpenAI losing its hype and grip on the state of the art models.