| ▲ | zozbot234 20 hours ago | |
With MoE models, if the complete weights for inactive experts almost fit in RAM you can set up mmap use and they will be streamed from disk when needed. There's obviously a slowdown but it is quite gradual, and even less relevant if you use fast storage. | ||
| ▲ | htrp 8 hours ago | parent [-] | |
any good packages you recommend for this? | ||