I can never see the point, though. Performance isn't anywhere near Opus, and even that gets confused following instructions or making tool calls in demanding scenarios. Open weights models are just light years behind.

I really, really want open weights models to be great, but I've been disappointed with them. I don't even run them locally, I try them from providers, but they're never as good as even the current Sonnet.

▲

vunderba 3 hours ago | parent | next [-]

I can't speak to using local models as agentic coding assistants, but I have a headless 128GB RAM machine serving llama.cpp with a number of local models that I use on a daily basis.

- Qwen3-VL picks up new images in a NAS, auto captions and adds the text descriptions as a hidden EXIF layer into the image, which is used for fast search and organization in conjunction with a Qdrant vector database.

- Gemma3:27b is used for personal translation work (mostly English and Chinese).

- Llama3.1 spins up for sentiment analysis on text.

▲

stavros 3 hours ago | parent [-]

Ah yeah, self-contained tasks like these are ideal, true. I'm more using it for coding, or for running a personal assistant, or for doing research, where open weights models aren't as strong yet.

	▲	vunderba 2 hours ago \| parent [-]
		Understood. Research would make me especially leery; I’d be afraid of losing any potential gains as I'd feel compelled to always go and validate its claims (though I suppose you could mitigate it a little bit with search engine tooling like Kagi's MCP system).

▲

andoando 4 hours ago | parent | prev | next [-]

They're great for some product use cases where you dont need frontier models.

	▲	stavros 4 hours ago \| parent [-]
		Yeah, for sure, I just don't have many of those. For example, the only use I have for Haiku is for summarizing webpages, or Sonnet for coding something after Opus produces a very detailed plan. Maybe I should try local models for home automation, Qwen must be great at that.

▲

lm28469 4 hours ago | parent | prev [-]

They're like 6 months away on most benchmarks, people already claimed coding wad solved 6 months ago, so which is it? The current version is the baseline that solves everything but as soon as the new version is out it becomes utter trash and barely usable

	▲	zozbot234 4 hours ago \| parent \| next [-]
		That's very large models at full quantization though. Stuff that will crawl even on a decent homelab, despite being largely MoE based and even quantization-aware, hence reducing the amount and size of active parameters.
	▲	stavros 4 hours ago \| parent \| prev [-]
		That's just a straw man. Each frontier model version is better than the previous one, and I use it for harder and harder things, so I have very little use for a version that's six months behind. Maybe for simple scripts they're great, but for a personal assistant bot, even Opus 4.6 isn't as good as I'd like.