Remix.run Logo
zozbot234 20 hours ago

With MoE models, if the complete weights for inactive experts almost fit in RAM you can set up mmap use and they will be streamed from disk when needed. There's obviously a slowdown but it is quite gradual, and even less relevant if you use fast storage.

htrp 8 hours ago | parent [-]

any good packages you recommend for this?