Remix.run Logo
drapado 14 hours ago

Cool! Pity they are not releasing a smaller A3B MoE model

ilc 11 hours ago | parent | next [-]

https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct

daemonologist 14 hours ago | parent | prev [-]

Their A3B Omni paper mentions that the Omni at that size outperformed the (unreleased I guess) VL. Edit: I see now that there is no Omni-235B-A22B; disregard the following. ~~Which is interesting - I'd have expected the larger model to have more weights to "waste" on additional modalities and thus for the opposite to be true (or for the VL to outperform in both cases, or for both to benefit from knowledge transfer).~~

Relevant comparison is on page 15: https://arxiv.org/abs/2509.17765