Things like these (Google also banned me from Antigravity for briefly using an agent) and the massive quality swings made me cancel all 3 subs last week and resort to my local Qwen 3.6 only. Open models are already great and only getting better, and I really enjoy the privacy and consistency of a model I run myself.

▲

SeanAnderson 3 hours ago | parent | next [-]

I don't think anyone is questioning all the benefits of using local LLMs. Those are readily apparent.

I just don't believe for an instant that they're anywhere in the same ballpark of capabilities as running Opus or similar. My time is the most valuable resource. Opus would need to be SIGNIFICANTLY more costly and unstable for me to start entertaining local models for day-to-day development.

Perhaps whatever work you're doing makes this trade-off more sensible, but I struggle to see how that could be true. I'm averse to running Sonnet on a large amount of software engineering problems - let alone Qwen.

▲

regexorcist 2 hours ago | parent | next [-]

I think you'd be surprised, I find that the harness is what makes the real difference. I also prefer to be on the loop, actively guide and review. Local models are definitely much less autonomous as of today so if you need to be churning out code at speed they're probably not for you.

▲

slopinthebag an hour ago | parent | prev | next [-]

If you know what you're doing and prompt it correctly, local models are great. If you're just vibe coding and relying on the LLM to fill in all the gaps for you and basically build the software for you, yeah you need SOTA to deal with that.

▲

jrm4 3 hours ago | parent | prev [-]

But, you know,

Yet.

	▲	dmd 2 hours ago \| parent [-]
		For now we infer through few weights, lossily; but then in full precision. Now I represent in part; but then shall I represent as fully as the data was sampled. 1 CorinthAIns 13:12

▲

tjpnz 22 minutes ago | parent | prev | next [-]

Spent the better part of a week trying to integrate local models into my LazyVim workflow. I've tried both Avante and CodeCompanion and have yet to find any configuration which remotely works. Either it goes into an endless loop, the project directory gets filled with garbage or it can't find the file to apply changes to despite it just being read from. Not sure if it's a Qwen problem, plugins, or Ollama.

▲

klaussilveira 3 hours ago | parent | prev | next [-]

How much VRAM do you need to achieve decent performance?

	▲	regexorcist 2 hours ago \| parent [-]
		I have a 64GB M1 Ultra dedicated to llama.cpp. I get 40 tok/s on a fresh session, decreasing slowly to about 25 tok/s at around 50% of the 256K context, then down to 20 tok/s or less beyond that, but I rarely let it go much higher and handoff instead. This is whith Qwen 36B A3B at 8Q without KV quantization. It's not super fast but perfectly usable for me.

▲

2ndorderthought 2 hours ago | parent | prev [-]

This is the future.