| ▲ | garethsprice 3 hours ago | |
Using OpenCode + OhMyOpenCode + Qwen 3.6 35B-A3B Q_4_KM on an Ada 4000 (20GB VRAM) at 55 tok/sec for generation (slower than it sounds as OpenCode has a bunch of context it adds). Meaning to check out pi when I get a minute as I hear that one mentioned a lot lately. I am using Opus to generate plans that the local agent then follows, then validated by Opus. So I'm not at 100% local but these models are increasingly part of my production workflow. Probably not worth doing - yet - unless you are a hobbyist who likes spending time and money tinkering. This setup is certainly not as "good" as Opus or other frontier models but they are "good enough" for an increasing number of rote tasks. You don't need to drive a Rolls Royce to the supermarket, when a used Corolla gets you there just fine. It also enables new workflows that would be cost-prohibitive with frontier LLMs (especially as token costs rise) - eg. overnight I use the Chrome devtools MCP and have the above setup fuzz-test as a user for a number of hours and see if it can break things. Even got it working with multi-modal so it can check screenshots, which blows my mind (and not my wallet, as Claude+screenshots burns $$$). The "12-18 months behind frontier" sounds about right, it's about where I was with gpt-4o and basic harnesses back then. In another 12-18 months my bet is we have Opus-level models that can be run locally for <$5k... but the frontier models will be even further forward (unless governments have blocked them). Fun times. | ||