Remix.run Logo
satvikpendem an hour ago

I'm running a service in production using Gemma 4 models, to get structured JSON output back from web search tool calls using Unsloth Studio and its API, but it did require a rather large and detailed system prompt and tool call healing if the format wasn't JSON for example (retries, reprompting with feeding the error back into the model, etc, this is also what Unsloth Studio does for its self-healing tool call feature). But once I did that, it's been working quite well and on benchmarks I've made, it's about 97% accurate after the first time and basically 100% accurate after retries.

This is running on a server though, not sure how well it'd work on a phone, I should try that. I used AI Edge Gallery on Android and it doesn't seem too good at the web search tool but maybe the web search tool itself, being a community made tool, is pretty bad, because tool calling via Unsloth Studio seems to work just fine with the exact same Gemma models on desktop/server vs the phone.

redox99 an hour ago | parent [-]

I agree that the web search tool probably is pretty bad. However a smart model would never hallucinate impossible weather data if the search tool failed.

I'm sure you can get some out of it if you babysit it with an optimized prompt, harness, etc and you can tolerate some failures. But when I try to run the ChatGPT prompts from my history, even if I pick the easier ones, it's hopeless.

I'd like to have a local agent on the phone with wikipedia level knowledge. But you probably need more like 30B params.

satvikpendem an hour ago | parent [-]

I use the 4B on my phone and it seems to work fine without tool calls. So it's definitely an issue with that and not the model itself. I'll play around and see if I can fix that, you might also try using the Searxng MCP as it's a better web search engine one.

redox99 41 minutes ago | parent [-]

I tried most prompts that didn't rely on recent knowledge on the basic "AI Chat", not the "Agent skills" version.

I just tested "List the 5 most recent Argentina vice presidents" on E4B and it literally got all 5 wrong

satvikpendem 29 minutes ago | parent [-]

I use it for recommendations rather than knowledge, like recipes or basic stuff like that rather than knowledge, I mean it's likely due to its knowledge cutoff so it's not necessarily accurate. But the agent skills section does have a query Wikipedia tool call.

Try this on Unsloth Studio, they seem to have fixed Gemma tool calling.

redox99 27 minutes ago | parent [-]

Argentina vice presidents span from 2007 to 2023. Knowledge cutoff cant explain getting all 5 of them wrong.