| ▲ | zozbot234 4 hours ago | |||||||
It will benefit from a full amount of memory for sure, but AIUI if you use system memory and mmap for your experts you can execute the model with only enough memory for the active parameters, it's just unbearably slow since it has to swap in new experts for every token. So the more memory you have in excess to that, the more inactive but often-used experts can be kept in RAM for better performance. | ||||||||
| ▲ | EnPissant 39 minutes ago | parent [-] | |||||||
The ability to stream weights from disk has nothing to do with MoE or not. You can always do this. It will be unusable either way. | ||||||||
| ||||||||