| ▲ | zozbot234 2 days ago | |||||||
MoE models will have far more world knowledge than dense models with the same amount of active parameters. MoE is a no-brainer if your inference setup is ultimately limited by compute or memory throughput - not total memory footprint - or alternately if it has fast, high-bandwidth access to lower-tier storage to fetch cold model weights from on demand. | ||||||||
| ▲ | regularfry 2 days ago | parent | next [-] | |||||||
Yes, this. I can run the 122B Qwen3.5 MoE usably on one 4090 + 64GB RAM. That's a monster of a model, comparatively speaking. | ||||||||
| ▲ | aitchnyu 2 days ago | parent | prev [-] | |||||||
Tangential. I'm a newb, can you name the concept of partitioning weights so we dont need to load whole thing? | ||||||||
| ||||||||