Remix.run Logo
moffkalast 2 hours ago

Seems to be for both according to the spec [0], maybe it's wrong though.

128 sounds really tiny, I wonder if they mean some kind of blocks?

[0] https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash#4...

E-Reverance 2 hours ago | parent [-]

No

> It uses 384 routed experts (top-8) with hybrid attention (full-attention + sliding-window 128 at 6:1 ratio) over 70 layers (1 dense + 69 MoE)

https://recipes.vllm.ai/XiaomiMiMo/MiMo-V2.5-Pro