Cool, well let me know when Opus 4.5 level performance is available locally, at speeds that serve everyday use, and 100% I'm right there with you.

Until then, I'm going to keep sending my JSON to the server farm in Virginia because it's the only place that can serve me a model that actually works for my uses.

▲

Aurornis 5 hours ago | parent | next [-]

I experiment a lot with local models, and I agree.

I have a lot of fun with the local models and seeing what they can do.

I appreciate the SOTA models even more after my local experiments. The local models are really impressive these days, but the gap to SOTA is huge for complex tasks.

▲

agnishom an hour ago | parent | prev | next [-]

The article is not about those use cases. There are plenty of use cases for which local models are already pretty good

▲

janalsncm 2 hours ago | parent | prev | next [-]

Reasoning over a large codebase is only one use case for large models. For the use cases in the article (summarizing, classifying, basic text rewrites) most phones can handle them just fine.

▲

2 hours ago | parent | prev | next [-]

[deleted]

▲

binyu 5 hours ago | parent | prev | next [-]

DeepSeek V4 with 1 million token context window is pretty powerful, although still not there. There's hope that Opus 4.5 level performance locally is not that far away.

▲

Aurornis 5 hours ago | parent | next [-]

Running DeepSeek V4 without extreme quantization locally requires a lot of hardware.

The IQ2 quants that fit into 128GB machines are very degraded.

	▲	binyu 5 hours ago \| parent [-]
		That is true, it is a 1.6T parameters model so it requires a great deal of memory. I also heard there's a 2bit quantization that works well on Apple metal.

▲

tuananh 5 hours ago | parent | prev [-]

From what I read, ds v4 is very close with opus 4.6 performance.

▲

DeathArrow 2 hours ago | parent [-]

The full model is, not the quantized versions.

	▲	tuananh an hour ago \| parent [-]
		yeah that goes without saying. how can openweight, quantized version beat SOTA :)

▲

thefounder 5 hours ago | parent | prev | next [-]

Next year there will be Opus 4.5 level available on open source models so theoretically you may be able to run it locally but in reality it will be too expensive (i.e maybe 2 x max Studio 512GB ram each) for “normal” users.

▲

storus 5 hours ago | parent | prev | next [-]

Depending on a task, there are already models matching Opus 4.5. Just not in everything. But you can always swap a local model for a particular task.

▲

bugglebeetle 5 hours ago | parent | prev [-]

The frontier Chinese open source models are already at this level, GLM-5.1 and Kimi K2.6 specifically.

▲

DeathArrow 2 hours ago | parent [-]

But you can't run the locally at full quality. And quantized versions you can run locally are a far cry from Opus 4.6.

	▲	bugglebeetle 2 hours ago \| parent [-]
		Anthropic serves quantized versions of their models and you can run q8 locally.