Remix.run Logo
DennisP 4 days ago

No CUDA, 1.6T parameters but with 49B active...does that mean you can run it efficiently on a 64GB macbook?

segmondy 4 days ago | parent | next [-]

no, you need as much ram as the total model. But it means you can load the most important tensors in a smaller GPU. So you can run it on a PC with say 2 32gb rtx 5090 and 1tb+ of system ram.

leodavi 4 days ago | parent | prev [-]

Probably not. The active parameter set may change from token to token, based on my understanding of MoE, so you'd be streaming (at the worst case, unlikely for a real scenario but frames the problem) 49B parameters from SSD for every output token...