Remix.run Logo
gardnr 3 days ago

This is a 30B parameter MoE with 3B active parameters and is the successor to their previous 7B omni model. [1]

You can expect this model to have similar performance to the non-omni version. [2]

There aren't many open-weights omni models so I consider this a big deal. I would use this model to replace the keyboard and monitor in an application while doing the heavy lifting with other tech behind the scenes. There is also a reasoning version, which might be a bit amusing in an interactive voice chat if it pronounces the thinking tokens while working through to a final answer.

1. https://huggingface.co/Qwen/Qwen2.5-Omni-7B

2. https://artificialanalysis.ai/models/qwen3-30b-a3b-instruct

red2awn 3 days ago | parent | next [-]

This is a stack of models:

- 650M Audio Encoder

- 540M Vision Encoder

- 30B-A3B LLM

- 3B-A0.3B Audio LLM

- 80M Transformer/200M ConvNet audio token to waveform

This is a closed source weight update to their Qwen3-Omni model. They had a previous open weight release Qwen/Qwen3-Omni-30B-A3B-Instruct and a closed version Qwen3-Omni-Flash.

You basically can't use this model right now since none of the open source inference framework have the model fully implemented. It works on transformers but it's extremely slow.

olafura 3 days ago | parent | prev | next [-]

Looks like it's not open source: https://www.alibabacloud.com/help/en/model-studio/qwen-omni#...

coder543 3 days ago | parent [-]

No... that website is not helpful. If you take it at face value, it is claiming that the previous Qwen3-Omni-Flash wasn't open either, but that seems wrong? It is very common for these blog posts to get published before the model weights are uploaded.

red2awn 3 days ago | parent [-]

The previous -Flash weight is closed source. They do have weights for the original model that is slightly behind in performance https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct

coder543 3 days ago | parent [-]

Based on things I had read over the past several months, Qwen3-Flash seemed to just be a weird marketing term for the Qwen3-Omni-30B-A3B series, not a different model. If they are not the same, then that is interesting/confusing.

red2awn 3 days ago | parent [-]

It is an in-house closed weight model for their own chat platform, mentioned in Section 5 of the original paper: https://arxiv.org/pdf/2509.17765

I've seen it in their online materials too but can't seem to find it now.

gardnr 3 days ago | parent | prev | next [-]

I can't find the weights for this new version anywhere. I checked modelscope and huggingface. It looks like they may have extended the context window to 200K+ tokens but I can't find the actual weights.

pythux 3 days ago | parent [-]

They link to: https://huggingface.co/collections/Qwen/qwen3-omni-68d100a86... from the blog post but it does seem like this redirects to their main space on HF so maybe they didn't yet make the model public?

tensegrist 3 days ago | parent | prev | next [-]

> There is also a reasoning version, which might be a bit amusing in an interactive voice chat if it pronounces the thinking tokens while working through to a final answer.

last i checked (months ago) claude used to do this

plipt 3 days ago | parent | prev | next [-]

I dont think the Flash model discussed in the article is 30B

Their benchmark table shows it beating Qwen3-235B-A22B

Does "Flash" in the name of a Qwen model indicate a model-as-a-service and not open weights?

red2awn 3 days ago | parent [-]

Flash is a closed weight version of https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct (it is 30B but with addtional training on top of the open weight release). They deploy the flash version on Qwen's own chat.

plipt 3 days ago | parent [-]

Thanks

Was it being closed weight obvious to you from the article? Trying to understand why I was confused. Had not seen the "Flash" designation before

Also 30B models can beat a semi-recent 235B with just some additional training?

red2awn 3 days ago | parent [-]

They had a Flash variant released alongside the original open weight release. It is also mentioned in Section 5 of the paper: https://arxiv.org/pdf/2509.17765

For the evals it's probably just trained on a lot of the benchmark adjacent datasets compared to the 235B model. Similar thing happened on other model today: https://x.com/NousResearch/status/1998536543565127968 (a 30B model trained specifically to do well in maths get near SOTA scores)

andy_ppp 2 days ago | parent | prev | next [-]

Haha, you could hear how it’s mind thinks, maybe by putting a lot of reverb on the thinking tokens or some other effect…

andy_xor_andrew 3 days ago | parent | prev [-]

> This is a 30B parameter MoE with 3B active parameters

Where are you finding that info? Not saying you're wrong; just saying that I didn't see that specified anywhere in the linked page, or on their HF.

gardnr 29 minutes ago | parent | next [-]

I was wrong. I confused this with their open model. Looking at it more closely, it is likely an omni version of Qwen3-235B-A22B. I wonder why they benchmarked it against Qwen2.5-Omni-7B instead of Qwen3-Omni-30B-A3B.

I wish I could delete the comment.

plipt 3 days ago | parent | prev [-]

The link[1] at the top of their article to HuggingFace goes to some models named Qwen3-Omni-30B-A3B that were last updated in September. None of them have "Flash" in the name.

The benchmark table shows this Flash model beating their Qwen3-235B-A22B. I dont see how that is possible if it is a 30B-A3B model.

I don't see a mention of a parameter count anywhere in the article. Do you? This may not be an open weights model.

This article feels a bit deceptive

1: https://huggingface.co/collections/Qwen/qwen3-omni