Local models are extraordinarily expensive if you're not maximizing throughput, and you're not going to be maximizing it.

Local models need to be resident in expensive RAM, the kind that has fat pipes to compute. And if you have a local app, how do you take a dependency on whatever random model is installed? Does it support your tool calling complexity? Does it have multimodal input? Does it support system messages in the middle of the conversation or not? Is it dumb enough to need reminders all the time?

Spend enough time building against local models and you'll see they're jagged in performance. You need to tune context size, trade off system message complexity with progressive disclosure. You simply can't rely on intelligence. A bunch of work goes into the harness.

Meanwhile, third party inference is getting the benefits of scale. You only need to rent a timeslice of memory and compute. It's consistent and everybody gets the same experience. And yes, it needs paying for, but the economics are just better.

▲

LPisGood 11 hours ago | parent | next [-]

> And if you have a local app, how do you take a dependency on whatever random model is installed?

Reading the tea leaves here, it will probably be common for OS’s to have built in models that can be accessed via API. Apple already does this.

▲

crazygringo 9 hours ago | parent | prev | next [-]

I don't know why you are being downloaded. These are precisely the facts that advocates for local models completely ignore.

Local models are absolutely going to be the future for things like simple automation and classification tasks that run occasionally and don't need to rely on internet access.

But for all of the serious stuff where you are doing knowledge work, the models will simply continue to be too big, and too slow to run locally.

The article says:

> Use cloud models only when they’re genuinely necessary.

But at least for me, they're genuinely necessary for 99+% of my LLM usage.

At the end of the day, the constraint here really is efficiency and cost.

Privacy can be ensured with the legal system, the same way that businesses that compete with Google still have no problem storing their data in Google Workspace and Google Cloud. The contractual guarantees of privacy are ironclad, and Google would lose its entire cloud business overnight as its customers fled if it ever violated those contractual agreements (on top of whatever penalties they allow for).

▲

bheadmaster 11 hours ago | parent | prev [-]

> And if you have a local app, how do you take a dependency on whatever random model is installed?

Why not ship your own model? In the age of Electron apps, 10GB+ apps are not unheard of.

▲

_heimdall 11 hours ago | parent | next [-]

Personally I wouldn't want a couple dozen apps installed all with their own model.

It seems easier to have industry specs that define a common interface for local models.

I also assume the OS can, or would need to, be involved in proving the models. That may not be a good thing depending on your views of OS vendors, but sharing a single local model does seem more like an OS concern.

▲

alex7o 11 hours ago | parent [-]

I mean the openai API is the industry standard for allowing apps to communicate with models, llama-server has it, oMLX has it, ollama has it, vLLM has it, lmstudio as well. I don't think this is such a hard thing to do, but it requires people to set it up.

	▲	_heimdall 11 hours ago \| parent [-]
		I don't know enough about that API surface to know if its a particularly good one for the use cases we'd have, but yes defining a universal spec for all implementors to support wouldn't be a big lift and is done in plenty of other areas already.

▲

alex7o 11 hours ago | parent | prev [-]

There is no other way than shipping your own model, because you will want an abstracted API over the inference, and you don't know what the user has installed. Also you can ship 9b fp4 model but it all just depends

	▲	_heimdall 11 hours ago \| parent \| next [-]
		Knowing what's installed would have to be an OS API. If LLMs provide a standard API surface to the OS, likely including metadata related to feature support.
	▲	LPisGood 11 hours ago \| parent \| prev [-]
		You can know what the user has installed if the OS developer offers something.