> I have been moving more and more to K2.7 Code and GLM-5.2 the last few weeks. They are often good enough for assistance, very fast, and cheap.

I've moved completely to local models that I run with my M1 Mac Studio (64gb ram) some time ago. But for the rare times when I feel the local, quantized Qwen3.6 isn't enough, I just connect to Openrouter and use something like Kimi, GLM or Deepseek for a fraction of the price of Anthropic et al.

▲

plasticsoprano 8 hours ago | parent | next [-]

Which quant do you use? I have a similar setup and the speed is atrocious at 4-bit.

	▲	nozzlegear 7 hours ago \| parent [-]
		I'm using 4-bit as well, with the MoE model. I also use the MLX versions which are optimized for Apple CPUs (from what I understand anyway, I'm just an LLM layman). According to my oMLX dashboard, I'm getting about 50 tokens per second out of this model – not blazing fast, but more than fast enough to be useful to me. https://huggingface.co/mlx-community/Qwen3.6-35B-A3B-OptiQ-4...

▲

kamranjon 8 hours ago | parent | prev [-]

This is the way

	▲	nimchimpsky 8 hours ago \| parent [-]
		[dead]