Remix.run Logo
starchild3001 6 days ago

I feel there's some "benchmark-hacking" is going on with GPT4.1 model as its metrics on livebench.com aren't all that exciting.

- It's basically GPT4o level on average.

- More optimized for coding, but slightly inferior in other areas.

It seems to be a better model than 4o for coding tasks, but I'm not sure if it will replace the current leaders -- Gemini 2.5 Pro, o3-mini / o1, Claude 3.7/3.5.