▲ | bytefactory 2 days ago | |
I had long been of the opinion that local models were a long way away from being useful, and that they were toys at best. I'm a heavy user of o3/GPT5, Claude Opus/Sonnet and Gemini 2.5 Pro, so my expectations were sky high. I tried out Gemma 27B on LM Studio a few days ago, and I was completely blown away! It has a warmth and character (and smarts!) that I was not expecting in a tiny model. It just doesn't have tool use (although there are hacky workarounds), which would have made it even better. Qwen 3 with 30B parameters (3B active) seems to be nearly as capable, but also supports tool use. I'm currently in the process of vibe coding an agent network with LangGraph orchestration, Gemma 27B/Qwen 3 30B-A3B with memory, context management and tool management. The Qwen model even uses a tiny 1.7B "draft" model for speculative decoding improving performance. In my 7800x3D, RTX 4090 with 64GB RAM, I have latency of ~200-400ms, and 20-30 tokens/s which is plenty fast. My thought process is that this local stack will let me use agents to their fullest in administering my machine. I always felt uneasy letting Claude Code, Gemini CLI or Codex operate outside my code folders. Yet, their utility in helping me troubleshoot problems (I'm a recent Linux convert) was too attractive to ignore. Now I have the best of both worlds. Privacy, and AI models helping with sysadmin. They're also great for quick "what options does kopia backup use?" type questions I've assigned a global hotkeyed helper for. Additionally, if one has a NAS with the *arr stack for downloading, say perfectly legal Linux ISOs, such a private model would be far more suitable. It's early days, but I'm excited about other use cases i might discover over time! It's a good time to be an AI enthusiast. |