| ▲ | mft_ 3 hours ago | |
(I'm not one of the people you're speaking of with a 128gb M5 but) if you want to run one of the medium-sized open-weights models (Qwen 27b, 35b, Gemma 4 26b, 31b) or larger, you get into an interesting optimisation space. * yes, you can run it on an older/smaller GPU plus system RAM but performance will suffer * if you want optimal GPU performance you need the model in VRAM plus context, so 24GB (3090, 4090) or 32GB (5090) cards, plus a system that's reasonable powerful to plug them in to. Ideally you'd have a multiple cards working together but for optimal performance this means either 2x 3090 or nvidia's workstation cards. * you can go for a 128gb Strix Halo system, but the memory bandwidth isn't great and they're becoming increasingly more expensive (5.5k EUR for HP laptop, 3.9k EUR for GMKtec EVO-X2 mini PC) * you can go for a 128gb DGX Spark (5k EUR+) which also has unspectacular memory bandwidth or RTX Spark (price unclear but probably not cheaper) * or go for a Mac with a decent CPU and a good amount of RAM (bandwidth varies by model, but typically a bit better than Strix Halo/DGX Spark and worse than bespoke GPUs. As usual with such questions, there are of course cheaper paths (if you want to accept the tradeoffs) but Macs are reasonable vs. competition for these workloads. | ||