Benchmarks pasted here, with top scores highlighted. Overall Qwen Max is pretty competitive with the others here.
Capability Benchmark GPT-5.2-Thinking Claude-Opus-4.5 Gemini 3 Pro DeepSeek V3.2 Qwen3-Max-Thinking
Knowledge MMLUPro 87.4 89.5 *89.8* 85.0 85.7
Knowledge MMLURedux 95.0 95.6 *95.9* 94.5 92.8
Knowledge CEval 90.5 92.2 93.4 92.9 *93.7*
STEM GPQA *92.4* 87.0 91.9 82.4 87.4
STEM HLE 35.5 30.8 *37.5* 25.1 30.2
Reasoning LiveCodeBench v6 87.7 84.8 *90.7* 80.8 85.9
Reasoning HMMT Feb 25 *99.4* - 97.5 92.5 98.0
Reasoning HMMT Nov 25 - - 93.3 90.2 *94.7*
Reasoning IMOAnswerBench *86.3* 84.0 83.3 78.3 83.9
Agentic Coding SWE Verified 80.0 *80.9* 76.2 73.1 75.3
Agentic Search HLE (w/ tools) 45.5 43.2 45.8 40.8 *49.8*
Instruction Following & Alignment IFBench *75.4* 58.0 70.4 60.7 70.9
Instruction Following & Alignment MultiChallenge 57.9 54.2 *64.2* 47.3 63.3
Instruction Following & Alignment ArenaHard v2 80.6 76.7 81.7 66.5 *90.2*
Tool Use Tau² Bench 80.9 *85.7* 85.4 80.3 82.1
Tool Use BFCLV4 63.1 *77.5* 72.5 61.2 67.7
Tool Use Vita Bench 38.2 *56.3* 51.6 44.1 40.9
Tool Use Deep Planning *44.6* 33.9 23.3 21.6 28.7
Long Context AALCR 72.7 *74.0* 70.7 65.0 68.7