▲ | simonw 10 hours ago | ||||||||||||||||
The model weights are 70GB (Hugging Face recently added a file size indicator - see https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct/tree... ) so this one is reasonably accessible to run locally. I wonder if we'll see a macOS port soon - currently it very much needs an NVIDIA GPU as far as I can tell. | |||||||||||||||||
▲ | a_e_k 10 hours ago | parent | next [-] | ||||||||||||||||
That's at BF16, so it should fit fairly well on 24GB GPUs after quantization to Q4, I'd think. (Much like the other 30B-A3B models in the family.) I'm pretty happy about that - I was worried it'd be another 200B+. | |||||||||||||||||
| |||||||||||||||||
▲ | growthwtf 10 hours ago | parent | prev | next [-] | ||||||||||||||||
A fun project for somebody who has more time than myself would be to see if they can get it working with the new Mojo stuff from yesterday for Apple. I don't know if the functionality would be fully baked out enough yet to actually do the port successfully, but it would be an interesting try. | |||||||||||||||||
| |||||||||||||||||
▲ | dcreater 9 hours ago | parent | prev | next [-] | ||||||||||||||||
is there an inference engine for this on macos? | |||||||||||||||||
| |||||||||||||||||
▲ | varispeed 7 hours ago | parent | prev [-] | ||||||||||||||||
Would it run on 5090? Or is it possible to link multiple GPUs or has NVIDIA locked it down? | |||||||||||||||||
|