| ▲ | yalogin 3 hours ago | |||||||||||||||||||||||||
Apple’s unified memory architecture plays a huge part in this. This will trigger a large scale rearchitecture of mobile hardware across the board. I am sure they are already underway. I understand this is for a demo but do we really need a 400B model in the mobile? A 10B model would do fine right? What do we miss with a pared down one? | ||||||||||||||||||||||||||
| ▲ | Aurornis 3 hours ago | parent | next [-] | |||||||||||||||||||||||||
> Apple’s unified memory architecture plays a huge part in this. This will trigger a large scale rearchitecture of mobile hardware across the board. I am sure they are already underway. Putting the GPU and CPU together and having them both access the same physical memory is standard for phone design. Mobile phones don't have separate GPUs and separate VRAM like some desktops. This isn't a new thing and it's not unique to Apple > I understand this is for a demo but do we really need a 400B model in the mobile? A 10B model would do fine right? What do we miss with a pared down one? There is already a smaller model in this series that fits nicely into the iPhone (with some quantization): Qwen3.5 9B. The smaller the model, the less accurate and capable it is. That's the tradeoff. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | root_axis 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Compared to a 400b model, a 10b is practically useless, it's not even worth bothering outside of tinkering for fun and research. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | refulgentis 3 hours ago | parent | prev [-] | |||||||||||||||||||||||||
What do we miss? Tl;dr a lot, model is much worse (Source: maintaining llama.cpp / cloud based llm provider app for 2-3 years now) | ||||||||||||||||||||||||||