| ▲ | Dylan16807 16 hours ago | |
You need all the weights every token, so even with optimal splitting the fraction of the weights you can farm out to an SSD is proportional to how fast your SSD is compared to your RAM. You'd need to be in a weirdly compute-limited situation before you can replace significant amounts of RAM with SSD, unless I'm missing something big. > MoE architecture should help quite a bit here. In that you're actually using a smaller model and swapping between them less frequently, sure. | ||