Apple’s unified memory architecture plays a huge part in this. This will trigger a large scale rearchitecture of mobile hardware across the board. I am sure they are already underway.

I understand this is for a demo but do we really need a 400B model in the mobile? A 10B model would do fine right? What do we miss with a pared down one?

▲

Aurornis 3 hours ago | parent | next [-]

> Apple’s unified memory architecture plays a huge part in this. This will trigger a large scale rearchitecture of mobile hardware across the board. I am sure they are already underway.

Putting the GPU and CPU together and having them both access the same physical memory is standard for phone design.

Mobile phones don't have separate GPUs and separate VRAM like some desktops.

This isn't a new thing and it's not unique to Apple

> I understand this is for a demo but do we really need a 400B model in the mobile? A 10B model would do fine right? What do we miss with a pared down one?

There is already a smaller model in this series that fits nicely into the iPhone (with some quantization): Qwen3.5 9B.

The smaller the model, the less accurate and capable it is. That's the tradeoff.

▲

alwillis 3 hours ago | parent [-]

> Putting the GPU and CPU together and having them both access the same physical memory is standard for phone design.

> Mobile phones don't have separate GPUs and separate VRAM like some desktops.

That's true. The difference is the iPhone has wider memory buses and uses faster LPDDR5 memory. Apple places the RAM dies directly on the same package as the SoC (PoP — Package on Package), minimizing latency. Some Android phones have started to do this, too.

iOS is tuned to this architecture which wouldn't be the case across many different Android hardware configurations.

▲

Aurornis 3 hours ago | parent [-]

> The difference is the iPhone has wider memory buses and uses faster LPDDR5 memory. Apple places the RAM dies directly on the same package as the SoC (PoP — Package on Package), minimizing latency. Some Android phones have started to do this, too.

Package-on-Package has been used in mobile SoCs for a long time. This wasn't an Apple invention. It's not new, either. It's been this way for 10+ years. Even cheap Raspberry Pi models have used package-on-package memory.

The memory bandwidth of flagship iPhone models is similar to the memory bandwidth of flagship Android phones.

There's nothing uniquely Apple in this. This is just how mobile SoCs have been designed for a long time.

	▲	happyopossum 2 hours ago \| parent [-]
		> The memory bandwidth of flagship iPhone models is similar to the memory bandwidth of flagship Android phones More correct to say that the memory bandwidth of ALL iPhone models is similar to the memory bandwidth of flagship Android models. The A18 and A18 pro do not differ in memory bandwidth.

▲

root_axis 3 hours ago | parent | prev | next [-]

Compared to a 400b model, a 10b is practically useless, it's not even worth bothering outside of tinkering for fun and research.

	▲	geek_at 2 hours ago \| parent [-]
		Still dreaming about an android keyboard that plugs into local or self hosted llm backend for smarter text predictions

▲

refulgentis 3 hours ago | parent | prev [-]

What do we miss?

Tl;dr a lot, model is much worse

(Source: maintaining llama.cpp / cloud based llm provider app for 2-3 years now)