I wonder if we'll see these models running on the phone (aiPhone) hardware in the future.

alwillis 12 hours ago | parent | next [-]

As someone mentioned, this model is available in the beta version of iOS 26; it's also part of macOS 26, iPadOS 26 and visionOS 26. Anyone with a free developer account can install the developer betas; the public beta is expected next week.

There's a WWDC video "Meet the Foundation Models Framework" [1].

[1]: https://developer.apple.com/videos/play/wwdc2025/286

▲

floam 13 hours ago | parent | prev | next [-]

It does. You can use it directly on iOS 26 beta - without writing a line of code I can toy with the on-device model through Shortcuts on my 16 Pro. It’s not meant to be a general purpose chatbot… but it can work as a general purpose chatbot in airplane mode which is a novel experience.

https://share.icloud.com/photos/018AYAPEm06ALXciiJAsLGyuA

https://share.icloud.com/photos/0f9IzuYQwmhLIcUIhIuDiudFw

The above took like 3 seconds to generate. That little box that says On-device can be flipped between On-device, Private Cloud Compute, and ChatGPT.

Their LLM uses the ANE sipping battery and leaves the GPU available.

▲

JKCalhoun 13 hours ago | parent | next [-]

Wild to see what improvements might come if there is additional hardware support in future Apple Silicon chips.

▲

ivape 12 hours ago | parent | prev | next [-]

What’s the cost of pointing it to Private Cloud Compute? It can’t be free, can it?

▲

floam 9 hours ago | parent [-]

It’s “free”, as in it doesn’t charge you anything or require a subscription: it’s a part of Apple Intelligence which is basically something bought with the device. It’s in the cloud so theoretically one shouldn’t need a quite new iPhone or Mac but - one does.

	▲	7 hours ago \| parent [-]
		[deleted]

▲

bigyabai 13 hours ago | parent | prev [-]

It would be interesting to see the tok/s comparison between the ANE and GPU for inference. I bet these small models are a lot friendlier than the 7B/12B models that technically fit on a phone but won't accelerate well without a GPU.

▲

gleenn 13 hours ago | parent | next [-]

I thought the big difference between the GPU and ANE was that you couldn't use the ANE to train. Does the GPU actually perform faster during inference as well? Is that because the ANE are designed more for efficiency or is there another bigger reason?

	▲	wmf 13 hours ago \| parent [-]
		GPUs are usually faster for inference simply because they have more ALUs/FPUs but they are also less efficient.

▲

mrheosuper 7 hours ago | parent | prev [-]

fitting 7B model on phone with 8gb ram for the whole system is impressive.

▲

kingnothing 13 hours ago | parent | prev | next [-]

> The new Foundation Models framework gives access to developers to start creating their own reliable, production-quality generative AI features with the approximately 3B parameter on-device language model. The ∼3B language foundation model at the core of Apple Intelligence excels at a diverse range of text tasks like summarization, entity extraction, text understanding, refinement, short dialog, generating creative content, and more. While we have specialized our on-device model for these tasks, it is not designed to be a chatbot for general world knowledge. We encourage app developers to use this framework to design helpful features tailored to their apps

▲

Zee2 13 hours ago | parent | prev [-]

> a ∼3B-parameter on-device model

	▲	ThomasBb 13 hours ago \| parent \| next [-]
		There are even already some local AFM to Open AI API bridge project on GitHub - that lets you point basically any Open AI compatible client at the local models. Super nice for basic summarisation and completions.
	▲	JKCalhoun 13 hours ago \| parent \| prev [-]
		I was worried "device" was a Mac mini, not an iPhone. (I already have been running models on my MacBook Pro.)