| ▲ | blahblaher 4 hours ago |
| qwen3.5/3.6 (30B) works well,locally, with opencode |
|
| ▲ | zozbot234 4 hours ago | parent | next [-] |
| Mind you, a 30B model (3B active) is not going to be comparable to Opus. There are open models that are near-SOTA but they are ~750B-1T total params. That's going to require substantial infrastructure if you want to use them agentically, scaled up even further if you expect quick real-time response for at least some fraction of that work. (Your only hope of getting reasonable utilization out of local hardware in single-user or few-users scenarios is to always have something useful cranking in the background during downtime.) |
| |
| ▲ | pitched 4 hours ago | parent | next [-] | | For a business with ten or more engineers/people-using-ai, it might still make sense to set this up. For an individual though, I can’t imagine you’d make it through to positive ROI before the hardware ages out. | | |
| ▲ | zozbot234 3 hours ago | parent | next [-] | | It's hard to tell for sure because the local inference engines/frameworks we have today are not really that capable. We have barely started exploring the implications of SSD offload, saving KV-caches to storage for reuse, setting up distributed inference in multi-GPU setups or over the network, making use of specialty hardware such as NPUs etc. All of these can reuse fairly ordinary, run-of-the-mill hardware. | |
| ▲ | DeathArrow 3 hours ago | parent | prev [-] | | Since you need at least a few of H100 class hardware, I guess you need at least few tens of coders to justify the costs. |
| |
| ▲ | wuschel 3 hours ago | parent | prev | next [-] | | What near SOTA open models are you referring to? | |
| ▲ | cyberax 3 hours ago | parent | prev [-] | | I'm backing up a big dataset onto tapes, so I wanted to automate it. I have an idle 64Gb VRAM setup in my basement, so I decided to experiment and tasked it with writing an LTFS implementation. LTFS is an open standard for filesystems for tapes, and there's an implementation in C that can be used as the baseline. So far, Qwen 3.6 created a functionally equivalent Golang implementation that works against the flat file backend within the last 2 days. I'm extremely impressed. | | |
| ▲ | Gareth321 20 minutes ago | parent [-] | | It is surprisingly competent. It's not Opus 4.6 but it works well for well structured tasks. |
|
|
|
| ▲ | pitched 4 hours ago | parent | prev | next [-] |
| I want to bump this more than just a +1 by recommending everyone try out OpenCode. It can still run on a Codex subscription so you aren’t in fully unfamiliar territory but unlocks a lot of options. |
| |
| ▲ | zozbot234 4 hours ago | parent | next [-] | | The Codex TUI harness is also open source and you can use open models with it, so you can stay in even more familiar territory. | |
| ▲ | pwython 4 hours ago | parent | prev [-] | | pi-coding-agent (pi.dev) is also great. I've been using it with Gemma 4 and Qwen 3.6. |
|
|
| ▲ | jherdman 4 hours ago | parent | prev | next [-] |
| Is this sort of setup tenable on a consumer MBP or similar? |
| |
| ▲ | Gareth321 19 minutes ago | parent | next [-] | | The Mac Minis (probably 64GB RAM) are the most cost effective. | |
| ▲ | danw1979 4 hours ago | parent | prev | next [-] | | Qwen’s 30B models run great on my MBP (M4, 48GB) but the issue I have is cooling - the fan exhaust is straight onto the screen, which I can’t help thinking will eventually degrade it, given the thermal cycling it would go through. A Mac Studio makes far more sense for local inference just for this reason alone. | |
| ▲ | pitched 4 hours ago | parent | prev [-] | | For a 30B model, you want at least 20GB of VRAM and a 24GB MBP can’t quite allocate that much of it to VRAM. So you’d want at least a 32GB MBP. | | |
| ▲ | richardfey 3 hours ago | parent | next [-] | | I have 24GB VRAM available and haven't yet found a decent model or combination.
Last one I tried is Qwen with continue, I guess I need to spend more time on this. | |
| ▲ | zozbot234 4 hours ago | parent | prev | next [-] | | It's a MoE model so I'd assume a cheaper MBP would simply result in some experts staying on CPU? And those would still have a sizeable fraction of the unified memory bandwidth available. | | |
| ▲ | pitched 3 hours ago | parent [-] | | I haven’t tried this myself yet but you would still need enough non-vram ram available to the cpu to offload to cpu, right? This is a fully novice question, I have not ever tried it. |
| |
| ▲ | _blk 3 hours ago | parent | prev [-] | | Is there any model that practically compares to Sonnet 4.6 in code and vision and runs on home-grade (12G-24G) cards? | | |
| ▲ | macwhisperer an hour ago | parent [-] | | im currently running a custom Gemma4 26b MoE model on my 24gb m2... super fast and it beat deepseek, chatgpt, and gemini in 3 different puzzles/code challenges I tested it on. the issue now is the low context... I can only do 2048 tokens with my vram... the gap is slowly closing on the frontier models |
|
|
|
|
| ▲ | cpursley 4 hours ago | parent | prev [-] |
| How are you running it with opencode, any tips/pointers on the setup? |