I think that's probably true for the vast majority of Android phones. But if you have a SOTA expensive beast, I wonder if Gemma 4 12B at 4 bit could work? Maybe something like a Redmagic 11 pro or OnePlus 13 running NanoClaw?

But also maybe a few Qwen 3.6 or Qwen 3.5 variants can fit and can handle some simple tasks.

▲

redox99 an hour ago | parent [-]

I think Gemma 4 12B is definitely possible to run on high end phones, google claims you need 16GB of memory. But it's probably not very usable, you'll need to swap most stuff other than the LLM.

When I tried E2B and E4B with Google Edge Gallery, and added a web search skill from the skill list, E2B would fail (get stuck in a loop), E4B would need a very specific instruction, "weather in [city name]" would not call the web search tool, I'd need "web search weather in [city name]". And the result was completely hallucinated and impossible. It claimed 14c and feels like 4c (which is impossible), and 10% humidity (which is almost impossible in this city)

Asking wikipedia level history questions (without any tool use), the results were awful as well.

▲

satvikpendem 26 minutes ago | parent [-]

I'm running a service in production using Gemma 4 models, to get structured JSON output back from web search tool calls using Unsloth Studio and its API, but it did require a rather large and detailed system prompt and tool call healing if the format wasn't JSON for example (retries, reprompting with feeding the error back into the model, etc, this is also what Unsloth Studio does for its self-healing tool call feature). But once I did that, it's been working quite well and on benchmarks I've made, it's about 97% accurate after the first time and basically 100% accurate after retries.

This is running on a server though, not sure how well it'd work on a phone, I should try that. I used AI Edge Gallery on Android and it doesn't seem too good at the web search tool but maybe the web search tool itself, being a community made tool, is pretty bad, because tool calling via Unsloth Studio seems to work just fine with the exact same Gemma models on desktop/server vs the phone.

	▲	redox99 11 minutes ago \| parent [-]
		I agree that the web search tool probably is pretty bad. However a smart model would never hallucinate impossible weather data if the search tool failed. I'm sure you can get some out of it if you babysit it with an optimized prompt, harness, etc and you can tolerate some failures. But when I try to run the ChatGPT prompts from my history, even if I pick the easier ones, it's hopeless. I'd like to have a local agent on the phone with wikipedia level knowledge. But you probably need more like 30B params.