Remix.run Logo
plipt 3 days ago

I dont think the Flash model discussed in the article is 30B

Their benchmark table shows it beating Qwen3-235B-A22B

Does "Flash" in the name of a Qwen model indicate a model-as-a-service and not open weights?

red2awn 3 days ago | parent [-]

Flash is a closed weight version of https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct (it is 30B but with addtional training on top of the open weight release). They deploy the flash version on Qwen's own chat.

plipt 3 days ago | parent [-]

Thanks

Was it being closed weight obvious to you from the article? Trying to understand why I was confused. Had not seen the "Flash" designation before

Also 30B models can beat a semi-recent 235B with just some additional training?

red2awn 3 days ago | parent [-]

They had a Flash variant released alongside the original open weight release. It is also mentioned in Section 5 of the paper: https://arxiv.org/pdf/2509.17765

For the evals it's probably just trained on a lot of the benchmark adjacent datasets compared to the 235B model. Similar thing happened on other model today: https://x.com/NousResearch/status/1998536543565127968 (a 30B model trained specifically to do well in maths get near SOTA scores)