| ▲ | vladgur 5 hours ago | |||||||||||||
I have used omlx.ai with great success to both download multiple mlx models (including gemma and qwen) suited for my hardware AND to be able to automagically launch both open-source and close-source (claude code, codex) harnesses using these models. All from a web or desktop UI You would not need to follow a blog post with omlx IMHO | ||||||||||||||
| ▲ | dofm 3 hours ago | parent | next [-] | |||||||||||||
FWIW I have not, on a 64GB M1 Max, seen any advantage from oMLX specifically or MLX generally over GGUF with llama.cpp. The Gemma 4 MLX builds I have found so far have been slower at the same quantisation and much slower with MTP. The built-in web UI for llama.cpp is really quite good once you have chosen your model. Otherwise I quite like LM Studio for tinkering. One thing I would say is that both Gemma-4 and Qwen 3.6 simply do not need a large chunk of the typical opencode system prompt. Better off without it. | ||||||||||||||
| ▲ | Dotnaught 4 hours ago | parent | prev | next [-] | |||||||||||||
In case anyone is looking for a sandbox to go with oMLX and Pi: https://github.com/Dotnaught/pi-sandbox | ||||||||||||||
| ||||||||||||||
| ▲ | fridder 5 hours ago | parent | prev [-] | |||||||||||||
It truly is the SOTA for local inference on mac. Even when there are regressions the dev(s) are insanely responsive. It is the most impressive opensource project I've seen in a awhile | ||||||||||||||
| ||||||||||||||