▲ | tripplyons 3 days ago | |
I have can host it on my M3 laptop somewhere around 30-40 tokens per second using mlx_lm's server command: mlx_lm.server --model mlx-community/Qwen3-Next-80B-A3B-Instruct-4bit --trust-remote-code --port 4444 I'm not sure if there is support for Qwen3-Next in any releases yet, but when I set up the python environment I had to install mlx_lm from source. |