▲ | elsombrero 4 days ago | |
well, I tried it and it works for me. llm output is hard to properly evaluate without actually using it. I read a lot of good comments on r/localllama, with most people suggesting qwen3 coder 30ba3b, but I never got it to work as well as GLM 4.5 air Q1. As for using Q2, it will fit in vram, but with very small context or spill over to RAM, but with quite an impact on speed depending on your setup. I have slow ddr4 ram and going for Q1 has been a good compromise for me, but YMMV. |