Other benchmark aggregates are less favorable to GPT-OSS-120B: https://arxiv.org/abs/2508.12461

With all these things, it depends on your own eval suite. gpt-oss-120b works as well as o4-mini over my evals, which means I can run it via OpenRouter on Cerebras where it's SO DAMN FAST and like 1/5th the price of o4-mini.

▲

indigodaddy 3 days ago | parent [-]

How would you compare gpt-oss-120b to (for coding):

Qwen3-Coder-480B-A35B-Instruct

GLM4.5 Air

Kimi K2

DeepSeek V3 0324 / R1 0528

GPT-5 Mini

Thanks for any feedback!

▲

petesergeant 3 days ago | parent [-]

I’m afraid I don’t use any of those for coding

▲

bigyabai 3 days ago | parent [-]

You're missing out. GLM 4.5 Air and Qwen3 A3B both blow OSS 120B out of the water in my experience.

	▲	indigodaddy 3 days ago \| parent [-]
		Ah good to hear! How about Qwen3-Coder-480B-A35B-Instruct? I believe that is the free Qwen3-coder model on openrouter