Congratulations, great work Kimi team.

Why is that Claude still at the top in coding, are they heavily focused on training for coding or is it their general training is so good that it performs well in coding?

Someone please beat the Opus 4.5 in coding, I want to replace it.

▲

pokot0 2 hours ago | parent | next [-]

I don't think that kind of difference in benchmarks has any meaning at all. Your agentic coding tool and the task you are working on introduce a lot more "noise" than that small delta.

Also consider they are all overfitting on the benchmark itself so there might be that as well (which can go in either directions)

I consider the top models practically identical for coding applications (just personal experience with heavy use of both GPT5.2 and Opus 4.5).

Excited to see how this model compares in real applications. It's 1/5th of the price of top models!!

▲

Balinares 3 hours ago | parent | prev | next [-]

I replaced Opus with Gemini Pro and it's just plain a better coder IMO. It'll restructure code to enable support for new requirements where Opus seems to just pile on more indirection layers by default, when it doesn't outright hardcode special cases inside existing functions, or drop the cases it's failing to support from the requirements while smugly informing you you don't need that anyway.

	▲	tmikaeld 31 minutes ago \| parent [-]
		Agreed, the problem I have with Gemini Pro and flash is that it’s extremely trigger happy. It will run delete, move, build immediately without asking (if allowed)

▲

MattRix 3 hours ago | parent | prev [-]

Opus 4.5 only came out two months ago, and yes Anthropic spends a lot of effort making it particularly good at coding.