MoE is excellent for the unified memory inference hardware like DGX Sparc, Apple Studio, etc. Large memory size means you can have quite a few B's and the smaller experts keeps those tokens flowing fast.