Remix.run Logo
whazor 4 hours ago

Would be interesting to use local models for:

- tool calling

- code base exploration

- anonymizing / abstracting your request

Such that your local AI communicates to frontier model like an expensive consultant giving high level advice.

I think due to the lower latency of a local model that this could be faster.

asimovDev 3 hours ago | parent | next [-]

I used Qwen 27b 8 bit MLX version on a decompiled android APK recently. It succesfully identified how it worked even the obfuscated classes and methods. It wrote a 1000 line documentation with examples but the time was dreadful. At some point it slowed down to 5 t/s so the whole thing took over an hour , the writing of documentation alone was over 40 minutes, fans blasting the entire time.

trey-jones an hour ago | parent [-]

I know it uses electricity, but part of the benefit of a local model has to be that you can let it do this while you sleep, and not pay Anthropic for an unknown number of tokens.

asimovDev an hour ago | parent [-]

yeah i totally understand and I am thoroughly impressed it works. And the electricity cost isn't that bad since it was on a ARM laptop (MacBook M3 Max) and not a beefy workstation with a GPU. I just let the agent do its work while watching the World Cup.

dofm 3 hours ago | parent | prev [-]

I doubt your experience of local models would be of lower latency, except for quite small models in edge uses.

In every way, the cloud products from the big two seem optimised for speed and speed of initial response even.

I don’t think most people are running local models for speed. More for control, privacy, interest, bloody-mindedness and general principle.