Anyone else looking at these developments and thinking that local llms are the future. So many advantages above remote, and the hardware is just not there jet, but another leap like apple silicon and the tech is there..

Ofcourse large corps will have fancy proprietary models, but for every day queries and tasks, local feels like a huge, and just slightly out of reach.

Am i missing something fundamental?

▲

daft_pink 2 hours ago | parent | next [-]

I’ve always believed local is the future. If you consider how your iPhone has a processor that is more powerful than something very large not too long ago.

	▲	aegis_camera an hour ago \| parent [-]
		I've ran this on an IPHONE 13 pro (6GB) memory, QWEN 3 1.7B runs good. So local will get more intelligent for the task you want it done soon or already.

▲

cl0ckt0wer 2 hours ago | parent | prev [-]

llm intelligence seems to be proportional to the ram used. All techniques like this will be used by everyone.

	▲	zozbot234 35 minutes ago \| parent [-]
		You can almost always use less RAM by making inference slower. Streaming MoE active weights from SSD is an especially effective variety of this, but even with a large dense model, you could run inference on a layer-wise basis (perhaps coalescing only a few layers at a time) if the model on its own is too large for your RAM. You need to store the KV-cache, but that takes only modest space and at least for ordinary transformers (no linear attention tricks) is append-only, which fits well with writing it to SSD (AIUI, this is also how "cached" prompts/conversations work under the hood).