Remix clone Hacker News

new | show | ask | jobs Github

	▲	vunderba 2 days ago
		The easiest way to get started is probably to use something like Ollama and use the `qwen3-vl:8b` 4‑bit quantized model [1]. It's a good balance between accuracy and memory, though in my experience, it's slower than older model architectures such as Llava. Just be aware Qwen-VL tends to be a bit verbose [2], and you can’t really control that reliably with token limits - it'll just cut off abruptly. You can ask it to be more concise but it can be hit or miss. What I often end up doing and I admit it's a bit ridiculous is letting Qwen-VL generate its full detailed output, and then passing that to a different LLM to summarize. - [1] https://ollama.com/library/qwen3-vl:8b - [2] https://mordenstar.com/other/vlm-xkcd