Silly, if you are going to come up with a new benchmark, then add capable models, they have Opus, Gemini Pro, and then Qwen3-32B.
Why not qwen3-coder-480b, qwen3-235b-instruct, deepseek-v3.1, kimi-k2, GLM-4.5, gpt-oss-120b?