I think the best models around right now that most people can fit some quantization on their computer if it's a apple silicon Mac or gaming PC would be:

For non-coding: Qwen3-30B-A3B-Instruct-2507 (or the thinking variant, depending on use case)

For coding: Qwen3-Coder-30B-A3B-Instruct

---

If you have a bit more vram, GLM-4.5-Air or the full GLM-4.5

▲

all2 4 days ago | parent [-]

Note that Qwen3 and Deepseek are hobbled in Ollama; they cannot use tools as the tool portion of the system prompt is missing.

Recommendation: use something else to run the model. Ollama is convenient, but insufficient for tool use for these models.

▲

theshrike79 3 days ago | parent [-]

Could you give a recommendation that works instead of saying what doesn't work?

	▲	simonw 3 days ago \| parent \| next [-]
		Try LM Studio or llama-server: https://simonwillison.net/2025/Aug/19/gpt-oss-with-llama-cpp...
	▲	all2 3 days ago \| parent \| prev [-]
		I would, but I haven't found a working solution.