| ▲ | Catloafdev 7 hours ago | |||||||||||||||||||||||||||||||||||||||||||
The model they reference can be easily run with 24gb+ of VRAM, and there are other similar models capable of running easily on 16gb of VRAM. It's not like 128gb is a requirement here. | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | bitexploder 6 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||
For a MBP I have 48 GB of RAM M5 Pro. It runs at about 12-14 t/s at Q4, you could probably optimize it further. RAM is not a limitation but overall memory bandwidth. Q8 is slower. 35B A3B Qwen is quite speedy, but a little less accurate. With Qwen 3.6 27B dense I can squeeze a 9B parameter model and use that for fast analysis or code scanning while 27B is churning on a task in the background. It is tight, but totally reasonable. The real sweet spot for Qwen 27B is getting it on something like a Dual 3090 system or some other config where it can blaze at 50-80 t/s and that costs well under 6K currently. It is a surprisingly capable model. Using something like GLM for orchestration, specs, task farming and then letting Qwen churn is relatively inexpensive. Overall I recommend people try models of this class out using OpenCode and some for pay service to experiment with them and understand how they work. I find they are very useful. Long term, I am convinced enough that if I wanted to use local models for any number of reasons I would be okay investing in a dual GPU box. The Mac is not fast enough for me and M5 Max is just too expensive relative to GPU linux box. Still, it is nice to have the models local ON the laptop and it is useful for what I care about locally. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | CMay 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
At 24GB, Gemma 4 31B QAT will be better and give more concise answers. This post is mostly about unquantized results, so it's less relevant and I can't say much about as I haven't tested Qwen or Gemma via cloud API or unquantized locally. All I can say is locally, quantized in a 24GB scenario, Gemma 4 31B is better in my tests which are mostly reasoning or C programming related. Gemma 4 is the only model series at this parameter scale I've seen correctly answer some of these. One of the answers even made me re-evaluate what I thought the correct answer was, which I did not expect. When I look at the Artificial Analysis numbers, I can see that some things about Qwen 3.6 look inflated as a result of either metrics that weren't measured yet for Gemma 4 31B, or for metrics that just aren't going to be relevant in a lot of the essential tasks. In a lot of the relevant metrics, Gemma 4 is either better or on par. Then once it's all quantized all those benchmark results will be hurt, and Gemma 4 QAT has better quantized performance. I think it's more competitive unquantized than people give it credit for and way better quantized than people give it credit for. Qwen 3.6 clearly isn't legitimately bad and maybe it's quite nice at fp16, but it was a disaster quantized in a 24GB scenario by comparison. | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | thewebguyd 7 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
I'd go for at least 32GB+. It'll fit in 24GB but leaves you little to no room for context, and that's at 4-bit quantization. If you want to run unquantized, you definitely need 128GB. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Numerlor 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
And if you go for actual GPUs it'll run much faster, I'd say 24gb may be pushing it for context, but my 5090 with 32GB VRAM is usually somewhere between 60 to 100 tok/s with mtp and 2-3k tok/s for prompt processing. I'm not sure what they cost now but it's definitely still quite far from the macbook, and there's also some other 32GB GPUs that are considerably more affordable | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | nok22kon 6 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||
a computer with 24 GB VRAM is at least $3000 | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||