Local modals are 6 months to 18 months behind frontier. Even if the performance of a cloud model is faster, it's clear that local is catching up.

▲

alecco 2 hours ago | parent | next [-]

> Local modals are 6 months to 18 months behind frontier.

I wish this was true but it is not. And I am working on open source models so if anything, I would have a bias towards agreeing with you.

Frontier closed models (GPT/Claude) are gaining distance to everybody else. Even Google, once the king.

Your claim is a meme coming from benchmark results and sadly a lot of models are benchmaxxed. Llama 4, and most notably the Grok 3 drama with a lot of layoffs. And Chinese big tech... well they have some cultural issues.

"Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":"

https://xcancel.com/N8Programs/status/2044408755790508113

---

But thank god at least we have DeepSeek. They keep releasing good models in spite of being so seriously resource constrained. Punching well above their weight. But they are not just 6 months behind, either.

▲

crystal_revenge 38 minutes ago | parent | next [-]

I’ve worked, for a long time professionally, in the open model space for 3 years and up to 2 months ago I would have agreed with you. But it’s empirically not the case today. These models (combined with a good harness) have dramatically improved in both power and performance.

Gemma 4 was a major improvement is self-hostable local models and Qwen-3.6-A34B is a beast, and runs great on an MBP (and insanely well on a 4090).

The biggest lift is combining these models with a good agent harness (personally prefer Hermes agent). But I’ve found in practice they’re really not benchmaxxing. I’ve had these agents successfully hand a few non-trivial research projects that I wouldn’t have been able to accomplish as successfully even last year.

When you add in the open-but-not local models, Kimi, GLM, Minimax, you have a lot of very nice options. For personal use anything I don’t use local models for I give to my Kimi 2.6 powered agent.

▲

dools 2 hours ago | parent | prev | next [-]

Kimi k2.6 is about on par with GPT 5.2 so I’d say open weight models are about 6 months behind.

	▲	cbg0 2 hours ago \| parent \| next [-]
		The Q4 quantization requires about 600GB of RAM without context, not exactly consumer hardware friendly.
	▲	janderland 2 hours ago \| parent \| prev [-]
		Has Kimi found a way to vastly reduce the amount of VRAM required without running at 3 tokens per second? That’s the real concern.

▲

tyre 2 hours ago | parent | prev [-]

The Chinese models should stay close on a lag. They’re doing a ton of distillation that, realistically, I’m not sure the American frontiers can stop.

	▲	alecco an hour ago \| parent [-]
		US labs got tough on "adversarial" distillation [1]. I suspect that's one of several reasons why Chinese big labs are lagging again. [0] US AI firms team up in bid to counter Chinese 'distillation' (Apr 7) https://finance.yahoo.com/sectors/technology/articles/us-ai-...

▲

__s 3 hours ago | parent | prev | next [-]

You still need the hardware

I've got a 128GB strix halo staying warm at home, it has nothing on top models with big budget. It's good supplement to low end plans for offloading grunt work / initial triage

▲

manmal 3 hours ago | parent [-]

Have you looked into DwarfStar 4?

▲

__s 2 hours ago | parent [-]

Been away from home for nearly a month, so was mostly going off Qwen 3.5 122b-a10b (Q4?) / Qwen 3.6 35b-a3b (Q8) / Gemma4 31b (Q8)

Thanks for suggestion tho, tool by antirez is always going to pique interest, I'll check it out when I'm finally home again

Tho says Metal / CUDA, so doesn't seem friendly to Linux AMD system

	▲	manmal 41 minutes ago \| parent [-]
		His quant that fits into 128GB looks interesting for Spark DGX as well IMO.

▲

greesil 3 hours ago | parent | prev | next [-]

How do you know this? I'm not trying to attack your statement, I am genuinely curious how anyone knows anything about model performance outside of benchmarks that are already in the training set.

	▲	scragz 3 hours ago \| parent [-]
		using them you kind of get a feeling for skill level and can extrapolate that better than juiced benchmarks.

▲

lukeschlather 3 hours ago | parent | prev | next [-]

It is not getting easier to obtain hardware that can run models which are sufficiently useful to undercut frontier models, if anything the cost of such hardware has gone up by 25% or more just in the past 6 months.

▲

aleqs 2 hours ago | parent [-]

I think hardware prices will come back down once we start seeing more efficiency improvements in models and hardware, and once more people and companies self-host models (which seems to be happening more and more these days). I think the massive infra/hardware expenditures of OpenAI and the like are going to end up unnecessary, leading to hardware price drops.

▲

t-sauer an hour ago | parent [-]

If companies decide to self-host, wouldn't that drive the demand and therefore prices up? Most companies currently do not have the needed infrastructure.

	▲	aleqs 29 minutes ago \| parent [-]
		I think companies will self host (including on rented hardware) even if it's more expensive, and that, along with efficiency improvements, will drop demand for big AI. I think big AI is overspending on hardware/datacenters at the moment.

▲

calvinmorrison 3 hours ago | parent | prev [-]

if that's true - and in 6 or 12 months i can get what i have today, it might not be worth paying anthropic.