| ▲ | abhikul0 10 hours ago | ||||||||||||||||
Mac has unified memory, so 36GB is 36GB for everything- gpu,cpu. | |||||||||||||||||
| ▲ | zozbot234 10 hours ago | parent | next [-] | ||||||||||||||||
CPU-MoE still helps with mmap. Should not overly hurt token-gen speed on the Mac since the CPU has access to most (though not all) of the unified memory bandwidth, which is the bottleneck. | |||||||||||||||||
| |||||||||||||||||
| ▲ | mhitza 10 hours ago | parent | prev [-] | ||||||||||||||||
For sure I was running on autopilot with that reply. Though in Q4 I would expect it to fit, as 24B-A4B Gemma model without CPU offloading got up to 18GB of VRAM usage | |||||||||||||||||