Remix clone Hacker News

new | show | ask | jobs Github

	▲	svachalek 4 days ago
		I think I saw something that got Ollama to run models on it? But it only works with tiny models. Seems like the neural engine is extremely power efficient but not fast enough to do LLMs with billions of parameters.
	▲	reddit_clone 3 days ago \| parent [-]
		I am running Ollama with 'SimonPu/Qwen3-Coder:30B-Instruct_Q4_K_XL' on a M4 pro MBP with 48 GB of memory. From Emacs/gptel, it seems pretty fast. I have never used the proper hosted LLMS, so I don't have a direct comparison. But the above LLM answered coding questions in a handful of seconds. The cost of memory (and disk) upgrades in apple machines is exorbitant.