Remix.run Logo
sourc3 8 hours ago

I am running qwen 3.6 9b quantized model on my m4 pro 48gb and it is barely useful to do some basic pi.dev/cc driven development. I think 128gb desktops are the sweet setup to actually get meaningful work done. However, getting your hands on one of these machines is difficult at the moment.

As much fun as it is to run these things locally don’t forget that your time is not free. I am slowly migrating my use cases to openrouter and run the largest qwen model for < $2-3/day with serious use for personal projects.

carbocation 7 hours ago | parent | next [-]

Was the choice of such a small model driven by a desire for high tok/sec? I ask because an m4 pro 48gb machine can run larger models (if model intelligence is the thing that would make it more useful).

sourc3 7 hours ago | parent [-]

Yes that was my goal. Also noticed a huge performance gain going from ollama to mlx. Your mileage may vary.

sjones671 7 hours ago | parent | prev | next [-]

Thanks for saying this. There's so much nonsense out there online about local models being better than Opus 4.7 and the like. It's just not true for regular users.

I have a brand new M5 MacBook Pro - top end with all the specs and I've tried local models and they're barely functional.

Yukonv 7 hours ago | parent | next [-]

What models and quantizations have you been trying? I've had great success with the larger Qwen 3.x models at 6-bit levels. Using 6 bit quantization is really the bare minimum to give local models a fair shot at agentic flows. Once you start pushing below that the models become more "dumb" from the limited bit space.

SecretDreams 6 hours ago | parent | prev [-]

The main benefits for local are:

1) control 2) privacy 3) transparent cost model

Cloud has tremendous value for speed, plug and play, and performance. You need to decide how those compete with the benefits of local - both today, and a year from now, e.g.

elij 7 hours ago | parent | prev | next [-]

I'm using the 30b MOE model on same spec with 65k tokens as a sub agent with tooling and it absolutely writes decent code. The dense 9b I agree wasn't great.

hparadiz 7 hours ago | parent | prev | next [-]

How does it (the openrouter version) compare to ChatGPT 5.5 or Claude Opus 4.6?

sourc3 7 hours ago | parent [-]

Good enough. It gets 60-70% of the work I need done for a lot less $ (keep in mind I am using these for personal projects that doesn’t generate revenue). If I was using it with the hopes of making money I think I would just use Codex at this point.

7 hours ago | parent | prev [-]
[deleted]