| ▲ | veunes 5 hours ago | |||||||||||||||||||||||||
I noticed the inference is routed through the gpu rather than the Apple neural engine. Google’s engineers likely gave up on trying to compile custom attention kernels for Apple’s proprietary tensor blocks iirc. While Metal is predictable and easy to port to, it drains the battery way faster than a dedicated NPU. Until they rewrite the backend for the ANE, this is just a flashy tech demo rather than a production-ready tool | ||||||||||||||||||||||||||
| ▲ | jonathaneunice 4 hours ago | parent | next [-] | |||||||||||||||||||||||||
Are the Apple neural engines even a practical target of LLMs? Maybe not strictly impossible, but ANE was designed with an earlier, pre-LLM style of ML. Running LLMs on ANE (e.g. via Core ML) possible in theory, but the substantial model conversion and custom hardware tuning required makes for a high hurdle IRL. The LLM ecosystem standardized around CPU/GPU execution, and to date at least seems unwilling to devote resources to ANE. Even Apple's MLX framework has no ANE support. There are models ANE runs well, but LLMs do not seem to be among them. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | GeekyBear 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
It will be interesting to see how things change in a couple of months at WWDC, when Apple is said to be replacing their decade old CoreML framework with something more geared for modern LLMs. > A new report says that Apple will replace Core ML with a modernized Core AI framework at WWDC, helping developers better leverage modern AI capabilities with their apps in iOS 27. https://9to5mac.com/2026/03/01/apple-replacing-core-ml-with-... | ||||||||||||||||||||||||||
| ▲ | liuliu 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
ANE is OK, but it pretty much needs to pack your single vector into at least 128. (Draw Things recently shipped ANE support inside our custom inference stack, without any private APIs). For token generation, that is not ideal, unless you are using a drafter so there are more tokens to go at one inference step. It is an interesting area to explore, and yes,this is a tech demo. There is a long way to go to production-ready, but I am more optimistic now than a few months back (with Flash-MoE, DFlash, and some tricks I have). | ||||||||||||||||||||||||||
| ▲ | tjoff 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
I'm certainly fine with it drawing some power. Running background processes might motivate the use of NPU more but don't exactly feel like a pressing need. Actively listen to you 24/7 and analyze the data isn't a usecase I'm eager to explore given the lack of control we have of our own devices. | ||||||||||||||||||||||||||
| ▲ | the_pwner224 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
> Google’s engineers likely gave up on trying to compile custom attention kernels for Apple’s proprietary tensor blocks iirc. The AI Edge Gallery app on Android (which is the officially recommended way to try out Gemma on phones) uses the GPU (lacks NPU support) even on first party Pixel phones. So it's less of "they didn't want to interface with Apple's proprietary tensor blocks" and more of that they just didn't give a f in general. A truly baffling decision. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | satvikpendem 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Edge Gallery app on Android has NPU support but it requires a beta release of AICore so I'm sure the devs are working on similar support for Apple devices too. | ||||||||||||||||||||||||||
| ▲ | chatmasta 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Isn’t Apple paying Google billions of dollars to license these things? Surely they should make it easier to compile for their native engines… | ||||||||||||||||||||||||||
| ▲ | InMice 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
On my iphone i can choose CPU or GPU in edge gallery. What would be the difference if I used CPU? | ||||||||||||||||||||||||||
| ▲ | bigyabai 3 hours ago | parent | prev [-] | |||||||||||||||||||||||||
The ANE is not a fast or realistic way to infer modern LLMs. | ||||||||||||||||||||||||||