| ▲ | jnovek 3 hours ago | |
I have a 64G/1T Studio with an M1 Ultra. You can probably run this model to say you’ve done it but it wouldn’t be very practical. Also I wouldn’t trust 3-bit quantization for anything real. I run a 5-bit qwen3.5-35b-A3B MoE model on my studio for coding tasks and even the 4-bit quant was more flaky (hallucinations, and sometimes it would think about running tools calls and just not run them, lol). If you decided to give it a go make sure to use the MLX over the GGUF version! You’ll get a bit more speed out of it. | ||