| ▲ | DennisP 4 days ago | |
No CUDA, 1.6T parameters but with 49B active...does that mean you can run it efficiently on a 64GB macbook? | ||
| ▲ | segmondy 4 days ago | parent | next [-] | |
no, you need as much ram as the total model. But it means you can load the most important tensors in a smaller GPU. So you can run it on a PC with say 2 32gb rtx 5090 and 1tb+ of system ram. | ||
| ▲ | leodavi 4 days ago | parent | prev [-] | |
Probably not. The active parameter set may change from token to token, based on my understanding of MoE, so you'd be streaming (at the worst case, unlikely for a real scenario but frames the problem) 49B parameters from SSD for every output token... | ||