▲ | starchild3001 6 days ago | |
I feel there's some "benchmark-hacking" is going on with GPT4.1 model as its metrics on livebench.com aren't all that exciting. - It's basically GPT4o level on average. - More optimized for coding, but slightly inferior in other areas. It seems to be a better model than 4o for coding tasks, but I'm not sure if it will replace the current leaders -- Gemini 2.5 Pro, o3-mini / o1, Claude 3.7/3.5. |