| ▲ | ryandrake 3 hours ago | |
Yea, I'm also kind of jealous of Apple folks with their unified RAM. On a traditional homelab setup with gobs of system RAM and a GPU with relatively little VRAM, all that system RAM sits there useless for running LLMs. | ||
| ▲ | zozbot234 3 hours ago | parent | next [-] | |
That "traditional" setup is the recommended setup for running large MoE models, leaving shared routing layers on the GPU to the extent feasible. You can even go larger-than-system-RAM via mmap, though at a non-trivial cost in throughput. | ||
| ▲ | 2 hours ago | parent | prev | next [-] | |
| [deleted] | ||
| ▲ | khimaros 2 hours ago | parent | prev [-] | |
Strix Halo is another option | ||