| ▲ | terhechte 3 days ago | |||||||
Is there a way to run these Omni models on a Macbook quantized via GGUF or MLX? I know I can run it in LMStudio or Llama.cpp but they don't have streaming microphone support or streaming webcam support. Qwen usually provides example code in Python that requires Cuda and a non-quantized model. I wonder if there is by now a good open source project to support this use case? | ||||||||
| ▲ | tgtweak 3 days ago | parent | next [-] | |||||||
You can probably follow the vLLM instructions for omni here, then use the included voice demo html to interface with it: https://github.com/QwenLM/Qwen3-Omni#vllm-usage https://github.com/QwenLM/Qwen3-Omni?tab=readme-ov-file#laun... | ||||||||
| ▲ | mobilio 3 days ago | parent | prev [-] | |||||||
Yes - there is a way: https://github.com/ggml-org/whisper.cpp | ||||||||
| ||||||||