If you don't care about how it's architectured, why you care about size? Compare it to Q3.5 397B-A17B.
Just like smaller size models are speed / cost optimization, so is MoE.
G4 26B-A4B goes 150 t/s on 4090/5090, 80 t/s on M5 Max. Q3.5 35B-A3B is comparably fast. They are flash-lite/nano class models.
G4 31B despite small increase in total parameter count is over 5 times slower. Q3.5 27B is comparably slow. They are approximating flash/mini class models (I believe sizes of proprietary models in this class are closer to Q3.5 122B-A10B or Llama 4 Scout 109B-A17B).