I have used omlx.ai with great success to both download multiple mlx models (including gemma and qwen) suited for my hardware AND to be able to automagically launch both open-source and close-source (claude code, codex) harnesses using these models. All from a web or desktop UI

You would not need to follow a blog post with omlx IMHO

▲ dofm 3 hours ago | parent | next [-]

FWIW I have not, on a 64GB M1 Max, seen any advantage from oMLX specifically or MLX generally over GGUF with llama.cpp.

The Gemma 4 MLX builds I have found so far have been slower at the same quantisation and much slower with MTP.

The built-in web UI for llama.cpp is really quite good once you have chosen your model. Otherwise I quite like LM Studio for tinkering.

One thing I would say is that both Gemma-4 and Qwen 3.6 simply do not need a large chunk of the typical opencode system prompt. Better off without it.

▲ Dotnaught 4 hours ago | parent | prev | next [-]

In case anyone is looking for a sandbox to go with oMLX and Pi: https://github.com/Dotnaught/pi-sandbox

	▲	zmmmmm 30 minutes ago \| parent \| next [-]
		it looks handy but ... `sbx policy set-default open` just so the single pi sandbox can talk to localhost? ... this gives me some grave doubts about the rest of it being set up well.
	▲	dofm 3 hours ago \| parent \| prev [-]
		This is useful. I'm still tinkering with Multipass VMs because I need the whole VM environment anyway and I'm on Sequoia. But I'd be interested if you did anything like that with Apple's container CLI instead; sooner or later I will have to upgrade to Tahoe because I want to play with the container CLI (and apfel).

▲ fridder 5 hours ago | parent | prev [-]

It truly is the SOTA for local inference on mac. Even when there are regressions the dev(s) are insanely responsive. It is the most impressive opensource project I've seen in a awhile

	▲	benbojangles 4 hours ago \| parent [-]
		Omlx needs to incorporate macos native shortcuts use - macos can almost instantly extract text from pdfs and a bunch of other things using it's ane neural engine keeping unified ram for llm use. The two together would be awesome