I'm not familiar with these open-source models. My bias is that they're heavily benchmaxxing and not really helpful in practice. Can someone with a lot of experience using these, as well as Claude Opus 4.5 or Codex 5.2 models, confirm whether they're actually on the same level? Or are they not that useful in practice?

P.S. I realize Qwen3-Max-Thinking isn't actually an open-weight model (only accessible via API), but I'm still curious how it compares.

▲

miroljub 6 hours ago | parent | next [-]

I don't know where your impression about benchmaxxing comes from. Why would you assume closed models are not benchmaxxing? Being closed and commercial, they have more incentive to fake it than the open models.

▲

segmondy 6 hours ago | parent | prev | next [-]

You are not familiar, yet you claim a bias. Bias based on what? I use pretty much just open-source models for the last 2 years. I occasionally give OpenAI and Anthropic a try to see how good they are. But I stopped supporting them when they started calling for regulation of open models. I haven't seen folks get ahead of me with closed models. I'm keeping up just fine with these free open models.

▲

orangebread 6 hours ago | parent | prev [-]

I haven't used qwen3 max yet, but my gut feeling is that they are benchmaxxing. If I were to rate the open models worth using by rank it'd be:

- Minimax

- GLM

- Deepseek

▲

segmondy 6 hours ago | parent [-]

Your ranking is way off, Deepseek crushes Minimax and GLM. It's not even a competition.

	▲	3 hours ago \| parent \| next [-]
		[deleted]
	▲	orangebread 5 hours ago \| parent \| prev [-]
		Yeah, I get there's nuance between all of them. I ranked Minimax higher for its agentic capabilities. In my own usage, Minimax's tool calling is stronger than Deepseek's and GLM.