Another related submission from 22 days ago : iPhone 17 Pro Demonstrated Running a 400B LLM (+700pts, +300cmts): https://news.ycombinator.com/item?id=47490070

▲

zozbot234 9 hours ago | parent [-]

That's very impressive but it's streaming in weights from flash storage. That's not really viable in a mobile context, it will use way too much power. Smaller models are way more applicable to typical use, perhaps with mid-sized models (like the Gemma4 26A4B model) using weights offload from SSD for rare uses involving slower "pro" inference.

▲

hadlock 3 hours ago | parent [-]

10 minutes a day of extreme power usage is probably fine for people asking for directions to the store, setting calendar reminders, timers, checking for important emails etc. AI on your phone will be incredibly useful but power usage doesn't matter when total usage is less than 15 minutes per day. I don't think the average person expects to vibe code on the phone for 8 hours a day.

	▲	zozbot234 2 hours ago \| parent [-]
		10 minutes a day or 15 minutes a day is what the inference workload is like on fairly small models. Once you start streaming in weights from SSD, things slow down quite a bit and become quite power hungry.