▲ | drapado 14 hours ago | |
Cool! Pity they are not releasing a smaller A3B MoE model | ||
▲ | ilc 11 hours ago | parent | next [-] | |
▲ | daemonologist 14 hours ago | parent | prev [-] | |
Their A3B Omni paper mentions that the Omni at that size outperformed the (unreleased I guess) VL. Edit: I see now that there is no Omni-235B-A22B; disregard the following. ~~Which is interesting - I'd have expected the larger model to have more weights to "waste" on additional modalities and thus for the opposite to be true (or for the VL to outperform in both cases, or for both to benefit from knowledge transfer).~~ Relevant comparison is on page 15: https://arxiv.org/abs/2509.17765 |