| ▲ | alexpotato 5 hours ago | |||||||||||||
I recently wrote a guide on getting: - llama.cpp - OpenCode - Qwen3-Coder-30B-A3B-Instruct in GGUF format (Q4_K_M quantization) working on a M1 MacBook Pro (e.g. using brew). It was bit finicky to get all of the pieces together so hopefully this can be used with these newer models. https://gist.github.com/alexpotato/5b76989c24593962898294038... | ||||||||||||||
| ▲ | freeone3000 4 hours ago | parent | next [-] | |||||||||||||
We can also run LM Studio and get it installed with one search and one click, exposed through an OpenAI-compatible API. | ||||||||||||||
| ▲ | kpw94 4 hours ago | parent | prev | next [-] | |||||||||||||
On my 32GB Ryzen desktop (recently upgraded from 16GB before the RAM prices went up another +40%), did the same setup of llama.cpp (with Vulkan extra steps) and also converged on Qwen3-Coder-30B-A3B-Instruct (also Q4_K_M quantization) On the model choice: I've tried latest gemma, ministral, and a bunch of others. But qwen was definitely the most impressive (and much faster inference thanks to MoE architecture), so can't wait to try Qwen3.5-35B-A3B if it fits. I've no clue about which quantization to pick though ... I picked Q4_K_M at random, was your choice of quantization more educated? | ||||||||||||||
| ||||||||||||||
| ▲ | robby_w_g 4 hours ago | parent | prev | next [-] | |||||||||||||
Does your MBP have 32 GB of ram? I’m waiting on a local model that can run decently on 16 GB | ||||||||||||||
| ▲ | copperx 5 hours ago | parent | prev [-] | |||||||||||||
How fast does it run on your M1? | ||||||||||||||