| ▲ | atonse 3 days ago |
| As a sibling poster has said, I don't know how much on-device AI is going to matter. I have pretty strong views on privacy, and I've generally thrown them all out in light of using AIs, because the value I get out of them is just so huge. If Apple actually had executed on their strategy (of running models in privacy-friendly sandboxes) I feel they would've hit it out of the park. But as it stands, these are all bleeding edge technologies and you have to have your best and brightest on them. And even with seemingly infinite money, Apple doesn't seem to have delivered yet. I hope the "yet" is important here. But judging by the various executives leaving (especially rumors of Johnny Srouji leaving), that's a huge red flag that their problem is that they're bleeding talent, and not a lack of money. |
|
| ▲ | twoodfin 3 days ago | parent | next [-] |
| I’m much more optimistic on device-side matmul. There’s just so much of it in aggregate and the marginal cost is so low especially since you need to drive fancy graphics to the screen anyway. Somebody will figure out how to use it—complementing Cloud-side matmul, of course—and Apple will be one of the biggest suppliers. |
|
| ▲ | scrollop 3 days ago | parent | prev | next [-] |
| You don't have to abandon privacy when using an eye - use a service that accesses enterprise APIs, which have good privacy policies. I use the service from the guys who create the This day in AI podcast called smithery.ai -we are access to all of the sota models so we can flip between any model including lots of open source ones within one chat or within multiple chats and compared the same query, using various MCPs and lots of other features. If you're interested have a look at the discord to simtheory.ai (I have no connection to the service or to the creators) |
|
| ▲ | ebbi 3 days ago | parent | prev | next [-] |
| Johnny Srouji sent out an email to his team confirming he is staying. |
| |
| ▲ | atonse 3 days ago | parent [-] | | That’s huge. Hope they can continue to keep such people because it isn’t just about one person. It’s all the other smart people that want to work with them. |
|
|
| ▲ | ph4rsikal 3 days ago | parent | prev [-] |
| On-device moves all compute cost (incl. electricity) to the consumer. I.e., as of 2025 that means much less battery life, a much warmer device, and much higher electricity costs. Unless the M-series can do substantially more with less this is a dead end. |
| |
| ▲ | veunes 3 days ago | parent | next [-] | | That's fair for brute force (running a model on the GPU), but that's exactly where NPUs come in - they are orders of magnitude more energy-efficient for matrix operations than GPUs. Apple has been putting NPUs in every chip for years for a reason. For short, bursty tasks (answer a question, generate an image), the battery impact will be minimal. It's not 24/7 crypto mining, it's impulse load | |
| ▲ | WatchDog 3 days ago | parent | prev | next [-] | | For the occasional local LLM query, running locally probably won't make much of a dent in the battery life, smaller models like mistral-7b can run at 258 tokens/s on an iPhone 17[0]. The reason why local LLMs are unlikely to displace cloud LLMs is memory footprint, and search.
The most capable models require hundreds of GB of memory, impractical for consumer devices. I run Qwen 3 2507 locally using llama-cpp, it's not a bad model, but I still use cloud models more, mainly due to them having good search RAG.
There are local tools for this, but they don't work as well, this might continue to improve, but I don't think it's going to get better than the API integrations with google/bing that cloud models use. [0]: https://github.com/ggml-org/llama.cpp/discussions/4508 | | |
| ▲ | ph4rsikal 3 days ago | parent [-] | | I used Mistral 7B a lot in 2023. It was a good model then. Now its not anywhere near where SOTA models are. |
| |
| ▲ | wooger 3 days ago | parent | prev | next [-] | | For me, when the AI service is operatied by the OS vendor, with root... What is the possible benefit of on device processing? * If you trust the OS vendor, why wouldn't you trust them to handle AI queries in a responsible, privacy respecting manner? * If you don't trust your OS vendor, you have a bigger problem than just privacy. Stop using it. What makes people think that on-device processed queries can't be logged and sent off for analysis anyway? | | |
| ▲ | reaperducer 3 days ago | parent [-] | | What is the possible benefit of on device processing? I envy your very simple, sedentary life where you are never outside of a high-speed wifi bubble. Look at almost every Apple ad: It's people climbing rocks, surfing, skiing, enjoying majestic vistas, and all those things that very often come with reduced or zero connectivity. Apple isn't trying to reach couch potatoes. |
| |
| ▲ | Marsymars 3 days ago | parent | prev | next [-] | | Battery isn't relevant to plugged-in devices, and in the end, electricity costs roughly the same to generate and deliver to a data center as to a home. The real cost advantage that cloud has is better amortization of hardware since you can run powerful hardware at 100% 24/7 spread across multiple people. I wouldn't bet on that continuing indefinitely, consumer hardware tends to catch up to HPC-exclusive workloads eventually. | | |
| ▲ | fn-mote 3 days ago | parent | next [-] | | You could have an AppleTV with 48 GB VRAM backing the local requests, but... the trend is "real computers" disappearing from homes, replaced by tablets and phones. The advantage the cloud has is Real Compute Power for the few seconds you need to process the interaction. That's not coming home any time soon. | | |
| ▲ | 827a 2 days ago | parent | next [-] | | Interestingly, some of Apple’s devices do already serve a special purpose like this in their ecosystem. The HomePod, HomePod Mini, and Apple TV act as Home Hubs for your network, which proxy WAN Apple Home requests to your IoT devices. No other Apple devices can do this. They also already practice a concept of computational offloading with the Apple Watch and iPhone; more complicated fitness calculations, like VO2Max, rely on watch-collected data, but evidence suggests they’re calculated on the phone (new VO2Max algorithms are implemented when you update iOS, not watchOS) So yeah; I can imagine a future where Apple devices could offload substantial AI requests to other devices on your Apple account, to optimize for both power consumption (plugged in versus battery) and speed (if you have a more powerful Mac versus your iPhone). There’s good precedent in the Apple ecosystem for this. Then, of course, the highest tier of requests are processed in their private cloud. | |
| ▲ | gowld 3 days ago | parent | prev [-] | | My Sun Ray is back in style! $30 on eBay! |
| |
| ▲ | ph4rsikal 3 days ago | parent | prev [-] | | One of the costs I see at the end of a month. The other I don't. | | |
| ▲ | Marsymars 3 days ago | parent [-] | | If the cloud AI is ad or VC-supported, sure, but that doesn't seem like a sustainable way to provide good user experience. And don't worry, I'm sure some enterprising electricity company is working out how to give you free electricity in exchange for beaming more ads into your home. |
|
| |
| ▲ | SchemaLoad 3 days ago | parent | prev [-] | | Apple runs all the heavy compute stuff overnight when your device is plugged in. The cost of the electricity is effectively nothing. And there is no impact on your battery life or device performance. |
|