Remix.run Logo
Galanwe 2 hours ago

The 5090 is crap for inference. Unless you like dummy models, sure they will run at light speed. All the rage is MoE with 500B-1T weights nowadays.

zozbot234 7 minutes ago | parent [-]

MoE is fine. You can put the shared weights on the 5090 (will fit handily even for the largest models) and expert weights on CPU, possibly with weights offload from storage.