Remix.run Logo
beej71 7 hours ago

News like this always makes me wonder about running my own model, something I've never done. A couple thousand bucks can get you some decent hardware, it looks like, but is it good for coding? What is your all's experience?

And if it's not good enough for coding, what kind of money, if any, would make it good enough?

arcanemachiner 6 hours ago | parent | next [-]

I want to give give you realistic expectations: Unless you spend well over $10K on hardware, you will be disappointed, and will spend a lot of time getting there. For sophisticated coding tasks, at least. (For simple agentic work, you can get workable results with a 3090 or two, or even a couple 3060 12GBs for half the price. But they're pretty dumb, and it's a tease. Hobby territory, lots of dicking around.)

Do yourself a favor: Set up OpenCode and OpenRouter, and try all the models you want to try there.

Other than the top performers (e.g. GLM 5.1, Kimi K2.5, where required hardware is basically unaffordable for a single person), the open models are more trouble than they're worth IMO, at least for now (in terms of actually Getting Shit Done).

_345 6 hours ago | parent [-]

We need more voices like this to cut through the bullshit. It's fine that people want to tinker with local models, but there has been this narrative for too long that you can just buy more ram and run some small to medium sized model and be productive that way. You just can't, a 35b will never perform at the level of the same gen 500b+ model. It just won't and you are basically working with GPT-4 (the very first one to launch) tier performance while everyone else is on GPT-5.4. If that's fine for you because you can stay local, cool, but that's the part that no one ever wants to say out loud and it made me think I was just "doing it wrong" for so long on lm studio and ollama.

zozbot234 5 hours ago | parent | next [-]

> We need more voices like this to cut through the bullshit.

Open models are not bullshit, they work fine for many cases and newer techniques like SSD offload make even 500B+ models accessible for simple uses (NOT real-time agentic coding!) on very limited hardware. Of course if you want the full-featured experience it's going to cost a lot.

solenoid0937 4 hours ago | parent [-]

I fell for this stuff, went into the open+local model rabbit hole, and am finally out of it. What a waste of time and money!

People that love open models dramatically overstate how good the benchmaxxed open models are. They are nowhere near Opus.

slopinthebag 2 hours ago | parent | prev [-]

> We need more voices like this to cut through the bullshit.

Just because you can't figure out how to use the open models effectively doesn't mean they're bullshit. It just takes more skill and experience to use them :)

efficax 2 hours ago | parent | prev | next [-]

gemma4 and qwen3.6 are pretty capable but will be slower and wrong more often than the larger models. But you can connect gemma4 to opencode via ollama and it.. works! it really can write and analyze code. It's just slow. You need serious hardware to run these fast, and even then, they're too small to beat the "frontier" models right now. But it's early days

mfro 6 hours ago | parent | prev | next [-]

Not sure why all the other commentors are failing to mention you can spend considerably less money on an apple silicon machine to run decent local models.

Fun fact: AWS offers apple silicon EC2 instances you can spin up to test.

__mharrison__ 5 hours ago | parent | prev | next [-]

My anecdotal experience with a recent project (Python library implemented and released to pypi).

I took the plan that I used from Codex and handed it to opencode with Qwen 3.5 running locally.

It created a library very similar to Codex but took 2x longer.

I haven't tried Qwen 3.6 but I hear it's another improvement. I'm confident with my AI skills that if/when cheap/subsidized models go away, I'll be fine running locally.

bakugo 6 hours ago | parent | prev | next [-]

You should be aware that any model you can run on less than $10k worth of hardware isn't going to be anywhere close to the best cloud models on any remotely complex task.

Many providers out there host open weights models for cheap, try them out and see what you think before actually investing in hardware to run your own.

hleszek 7 hours ago | parent | prev | next [-]

The latest Qwen3.6 model is very impressive for its size. Get an RTX 3090 and go to https://www.reddit.com/r/LocalLLaMA/ to see the latest news on how to run models locally. Totally fine for coding.

aray07 7 hours ago | parent | prev | next [-]

i think the new qwen models are supposed to be good based on some the articles that i read

DeathArrow 6 hours ago | parent | prev [-]

Unless you use H100 or 4x 5090 you won't get a decent output.

The best bang for the buck now is subcribing to token plans from Z.ai (GLM 5.1), MiniMax (MiniMax M2.7) or ALibaba Cloud (Qwen 3.6 Plus)

Running quantized models won't give you results comparable to Opus or GPT.