Is this sort of setup tenable on a consumer MBP or similar?

Gareth321 18 minutes ago | parent | next [-]

The Mac Minis (probably 64GB RAM) are the most cost effective.

danw1979 4 hours ago | parent | prev | next [-]

Qwen’s 30B models run great on my MBP (M4, 48GB) but the issue I have is cooling - the fan exhaust is straight onto the screen, which I can’t help thinking will eventually degrade it, given the thermal cycling it would go through. A Mac Studio makes far more sense for local inference just for this reason alone.

▲

pitched 4 hours ago | parent | prev [-]

For a 30B model, you want at least 20GB of VRAM and a 24GB MBP can’t quite allocate that much of it to VRAM. So you’d want at least a 32GB MBP.

▲

richardfey 3 hours ago | parent | next [-]

I have 24GB VRAM available and haven't yet found a decent model or combination. Last one I tried is Qwen with continue, I guess I need to spend more time on this.

▲

zozbot234 4 hours ago | parent | prev | next [-]

It's a MoE model so I'd assume a cheaper MBP would simply result in some experts staying on CPU? And those would still have a sizeable fraction of the unified memory bandwidth available.

	▲	pitched 3 hours ago \| parent [-]
		I haven’t tried this myself yet but you would still need enough non-vram ram available to the cpu to offload to cpu, right? This is a fully novice question, I have not ever tried it.

▲

_blk 3 hours ago | parent | prev [-]

Is there any model that practically compares to Sonnet 4.6 in code and vision and runs on home-grade (12G-24G) cards?

	▲	macwhisperer an hour ago \| parent [-]
		im currently running a custom Gemma4 26b MoE model on my 24gb m2... super fast and it beat deepseek, chatgpt, and gemini in 3 different puzzles/code challenges I tested it on. the issue now is the low context... I can only do 2048 tokens with my vram... the gap is slowly closing on the frontier models