| ▲ | efficax 15 hours ago | ||||||||||||||||
qwen3.6 does a good job locally except it can take 20-30 minutes to respond to a prompt on a mac studio with 32gb of ram. | |||||||||||||||||
| ▲ | smcleod 14 hours ago | parent | next [-] | ||||||||||||||||
Apple Silicon before the M4 does not have matmul instructions which causes the prompt processing to be very slow. It's quite different on the M5, much like using a nvidia GPU | |||||||||||||||||
| ▲ | 2ndorderthought 15 hours ago | parent | prev [-] | ||||||||||||||||
Yea you probably do want to use a GPU for models of that size. I also wonder what quantization you are using? If you haven't tried other quants I really would | |||||||||||||||||
| |||||||||||||||||