Believe me when I say that I want to run local models, and I do. But in my testing, 24 GB doesn't get you much brainpower.

▲

2ndorderthought 15 hours ago | parent [-]

Have you tried the latest qwen3.6 models?

For most of my questions and 8-9b model works great. Upshot is not having chatgpt/meta sell my data or target me with random thoughts later.

▲

entrope 14 hours ago | parent | next [-]

I let Qwen3.6-27B chew on a bug all last night. It choked at some point and stopped responding (probably a context overflow before pi-coding-agent could compact it). Claude Sonnet 4.6 found and fixed the bug in under 10 minutes.

Qwen3.6 is pretty amazing for a 27B model, but it's not hard to run into its limits. With a Radeon R9700 and unsloth's 6-bit quantization, I get ~20 TPS and 110k context, so it can do a fair bit quickly.

	▲	2ndorderthought 14 hours ago \| parent [-]
		You definitely need to watch it more than a model 100 times larger. But the fact that it runs one 1 GPU and does what it does is insane. Imagine what a 30b model looks like in 6 months or 1 year?

▲

datadrivenangel 14 hours ago | parent | prev | next [-]

Inference speed is still slow in a meaningfully different way. The models are good, but not great, and much slower, which for coding means a 2-3 minute task with claude code and opus takes an hour and has a higher chance of being wrong.

	▲	2ndorderthought 12 hours ago \| parent [-]
		It's only slow if you can't afford to run it properly. A lot of people are getting 70-100 tokens per second on 1 gpu. Not sure what Claude opus or sonnet run at. I know when it goes offline it's 0 tokens per second

▲

ekjhgkejhgk 15 hours ago | parent | prev [-]

We're in the same boat. I would rather have NO llm, than an llm that collects my data (which you should assume is all of them, unless you've been asleep for the last 20 years).

Fortunately, I don't have to pick one or the other - instead I run Qwen 3.6 35B A3B. It's a bit slow with my 8gb GPU (I'm in the process of getting a bigger one) but again, to me the choice isn't "what's the best I can get", it's "what's the best local I can get".