| ▲ | scosman 10 hours ago | |
I think that's what they are trying to avoid. If you need on-device intelligence, their pitch was "The model the device already has is best", and if you need something more specific an adapter (aka, a fine-tune/lora) is best. They were wrong when their on-device model was way behind. They still might be right in the long term. While multiple app I use might need Gemma 4 E4B, I use dozens of apps and app devs can choose from hundreds of models. A shared cache might reduce size a little when there's overlap, but the core problem still exists. If each app chooses a model disk and memory-swapping explode. Its probably be better for device manufacturers to bake in a default. I'm not proposing they limit you from using others, but one shared default might be best developer/user experience for 99% of apps. - Being warm in memory is the single biggest perf speedup you can get, and a default is much more likely to be warm. - "Best model" is usually "best model for this device" given both RAM and compute. A developer can't test every device but Apple can/will. - Each model needs to be optimized for the hardware (what's running on ANE, what's running on Metal, what's running on CPU). The default gets optimized. - If you need custom model, a Lora is probably best (30MB, benefits from all of the above) You could say the default should be swappable, but that's more a linux ideal than an Apple one so I doubt we ever see that. Plus there are real downsides: intentional or not, prompts end up optimized to the model they are developed for, so swapping the default system model would degrade every app. | ||
| ▲ | scotty79 5 hours ago | parent [-] | |
But models aren't universally best, especially small ones. For text Gemma is great. For vision qwen3.6 is amazing. | ||