| ▲ | dofm 5 hours ago | ||||||||||||||||||||||
Useful stuff in here that I wish I'd seen a few days ago :-) I am not convinced that the MTP setup for the QAT model adds very much in terms of speed on my M1 Max, but it is definitely worth experimenting with. Fiddling about with local models has done so much for my conceptual understanding of what is going on. FWIW and YMMV but I also found the Gemma 4 MTP head was occasionally breaking markup in Opencode, causing the thinking to display untidily and ultimately in some cases missing the stop token. So I've stopped using MTP there for now. Recent Qwen 3.6 models have developer role support so it will occasionally surprise you with a structured multiple choice questionnaire. | |||||||||||||||||||||||
| ▲ | mft_ 5 hours ago | parent | next [-] | ||||||||||||||||||||||
I found a marginal downside to Qwen3.6-35B-A3B-MTP vs. the non-MTP equivalent on an M1 Max. I’ll maybe experiment with settings further though. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | mark_l_watson 2 hours ago | parent | prev [-] | ||||||||||||||||||||||
when I started using QAT recently, I stopped trying to improve my configuration after that. I will try tuning my local environment again in a few months, but with QAT things are good enough for now. | |||||||||||||||||||||||