Remix clone Hacker News

new | show | ask | jobs Github

	▲	solarkraft 15 hours ago
		Looks like it: https://ollama.com/library/qwen3-vl:30b-a3b
	▲	thot_experiment 5 hours ago \| parent [-]
		fwiw on my machine it is 1.5x faster to inference in llama.cpp, these the settings i use for inference for the qwen i just keep in vram permanently `llama-server --host 0.0.0.0 --model Qwen3-VL-30B-A3B-Instruct-UD-Q4_K_XL.gguf --mmproj qwen3-VL-mmproj-F16.gguf --port 8080 --jinja --temp 0.7 --top-k 20 --top-p 0.8 -ngl 99 -c 65536 --repeat_penalty 1.0 --presence_penalty 1.5`