On-device moves all compute cost (incl. electricity) to the consumer. I.e., as of 2025 that means much less battery life, a much warmer device, and much higher electricity costs. Unless the M-series can do substantially more with less this is a dead end.

▲

veunes 3 days ago | parent | next [-]

That's fair for brute force (running a model on the GPU), but that's exactly where NPUs come in - they are orders of magnitude more energy-efficient for matrix operations than GPUs. Apple has been putting NPUs in every chip for years for a reason. For short, bursty tasks (answer a question, generate an image), the battery impact will be minimal. It's not 24/7 crypto mining, it's impulse load

▲

WatchDog 3 days ago | parent | prev | next [-]

For the occasional local LLM query, running locally probably won't make much of a dent in the battery life, smaller models like mistral-7b can run at 258 tokens/s on an iPhone 17[0].

The reason why local LLMs are unlikely to displace cloud LLMs is memory footprint, and search. The most capable models require hundreds of GB of memory, impractical for consumer devices.

I run Qwen 3 2507 locally using llama-cpp, it's not a bad model, but I still use cloud models more, mainly due to them having good search RAG. There are local tools for this, but they don't work as well, this might continue to improve, but I don't think it's going to get better than the API integrations with google/bing that cloud models use.

[0]: https://github.com/ggml-org/llama.cpp/discussions/4508

	▲	ph4rsikal 3 days ago \| parent [-]
		I used Mistral 7B a lot in 2023. It was a good model then. Now its not anywhere near where SOTA models are.

▲

wooger 3 days ago | parent | prev | next [-]

For me, when the AI service is operatied by the OS vendor, with root... What is the possible benefit of on device processing?

* If you trust the OS vendor, why wouldn't you trust them to handle AI queries in a responsible, privacy respecting manner?

* If you don't trust your OS vendor, you have a bigger problem than just privacy. Stop using it.

What makes people think that on-device processed queries can't be logged and sent off for analysis anyway?

	▲	reaperducer 3 days ago \| parent [-]
		What is the possible benefit of on device processing? I envy your very simple, sedentary life where you are never outside of a high-speed wifi bubble. Look at almost every Apple ad: It's people climbing rocks, surfing, skiing, enjoying majestic vistas, and all those things that very often come with reduced or zero connectivity. Apple isn't trying to reach couch potatoes.

▲

Marsymars 3 days ago | parent | prev | next [-]

Battery isn't relevant to plugged-in devices, and in the end, electricity costs roughly the same to generate and deliver to a data center as to a home. The real cost advantage that cloud has is better amortization of hardware since you can run powerful hardware at 100% 24/7 spread across multiple people. I wouldn't bet on that continuing indefinitely, consumer hardware tends to catch up to HPC-exclusive workloads eventually.

▲

fn-mote 3 days ago | parent | next [-]

You could have an AppleTV with 48 GB VRAM backing the local requests, but... the trend is "real computers" disappearing from homes, replaced by tablets and phones. The advantage the cloud has is Real Compute Power for the few seconds you need to process the interaction. That's not coming home any time soon.

	▲	827a 2 days ago \| parent \| next [-]
		Interestingly, some of Apple’s devices do already serve a special purpose like this in their ecosystem. The HomePod, HomePod Mini, and Apple TV act as Home Hubs for your network, which proxy WAN Apple Home requests to your IoT devices. No other Apple devices can do this. They also already practice a concept of computational offloading with the Apple Watch and iPhone; more complicated fitness calculations, like VO2Max, rely on watch-collected data, but evidence suggests they’re calculated on the phone (new VO2Max algorithms are implemented when you update iOS, not watchOS) So yeah; I can imagine a future where Apple devices could offload substantial AI requests to other devices on your Apple account, to optimize for both power consumption (plugged in versus battery) and speed (if you have a more powerful Mac versus your iPhone). There’s good precedent in the Apple ecosystem for this. Then, of course, the highest tier of requests are processed in their private cloud.
	▲	gowld 3 days ago \| parent \| prev [-]
		My Sun Ray is back in style! $30 on eBay!

▲

ph4rsikal 3 days ago | parent | prev [-]

One of the costs I see at the end of a month. The other I don't.

	▲	Marsymars 3 days ago \| parent [-]
		If the cloud AI is ad or VC-supported, sure, but that doesn't seem like a sustainable way to provide good user experience. And don't worry, I'm sure some enterprising electricity company is working out how to give you free electricity in exchange for beaming more ads into your home.

▲

SchemaLoad 3 days ago | parent | prev [-]

Apple runs all the heavy compute stuff overnight when your device is plugged in. The cost of the electricity is effectively nothing. And there is no impact on your battery life or device performance.