| ▲ | dust42 3 days ago | |||||||||||||
On a Macbook pro 64GB I use Qwen3-Coder-30B-A3B Q4 quant with llama.cpp. For VSCode I use continue.dev as it allows to set my own (short) system prompt. I get around 50token/sec generation and prompt processing 550t/s. When giving well defined small tasks, it is as good as any frontier model. I like the speed and low latency and the availability while on the plane/train or off-grid. Also decent FIM with the llama.cpp VSCode plugin. If I need more intelligence my personal favourites are Claude and Deepseek via API.  | ||||||||||||||
| ▲ | redblacktree 3 days ago | parent | next [-] | |||||||||||||
Would you use a different quant with a 128 GB machine? Could you link the specific download you used on huggingface? I find a lot of the options there to be confusing.  | ||||||||||||||
  | ||||||||||||||
| ▲ | Xenograph 3 days ago | parent | prev | next [-] | |||||||||||||
Have you tried continue.dev's new open completion model [1]? How does it compare to llama.vscode FIM with qwen?  | ||||||||||||||
| ▲ | codingbear 3 days ago | parent | prev [-] | |||||||||||||
how are you running qwen3 with llama-vscode? I am still using qwen-2.5-7b. There is an open issue about adding support for Qwn3 which I have been monitoring, would love to use Qwen3 if possible. Issue - https://github.com/ggml-org/llama.vscode/issues/55  | ||||||||||||||