| ▲ | whazor 4 hours ago | ||||||||||||||||
Would be interesting to use local models for: - tool calling - code base exploration - anonymizing / abstracting your request Such that your local AI communicates to frontier model like an expensive consultant giving high level advice. I think due to the lower latency of a local model that this could be faster. | |||||||||||||||||
| ▲ | asimovDev 3 hours ago | parent | next [-] | ||||||||||||||||
I used Qwen 27b 8 bit MLX version on a decompiled android APK recently. It succesfully identified how it worked even the obfuscated classes and methods. It wrote a 1000 line documentation with examples but the time was dreadful. At some point it slowed down to 5 t/s so the whole thing took over an hour , the writing of documentation alone was over 40 minutes, fans blasting the entire time. | |||||||||||||||||
| |||||||||||||||||
| ▲ | dofm 3 hours ago | parent | prev [-] | ||||||||||||||||
I doubt your experience of local models would be of lower latency, except for quite small models in edge uses. In every way, the cloud products from the big two seem optimised for speed and speed of initial response even. I don’t think most people are running local models for speed. More for control, privacy, interest, bloody-mindedness and general principle. | |||||||||||||||||