Remix clone Hacker News

new | show | ask | jobs Github

	▲	codingbear 3 days ago
		I use local for code completions only. Which means models supporting FIM tokens. My current setup is the llama-vscode plugin + llama-server running Qwen/Qwen2.5-Coder-7B-Instruct. It leads to very fast completions, and don't have to worry about internet outages which take me out of the zone. I do wish qwen-3 released a 7B model supporting FIM tokens. 7B seems to be the sweet spot for fast and usable completions
	▲	Mostlygeek 3 days ago \| parent [-]
		qwen3-coder-30B-A3B supports FIM and should be faster than the 7B if you got the vram. I use bartowkski’s Q8 quant over dual 3090s and it gets up to 100tok/sec. The Q4 quant on a single 3090 is very fast and decently smart.