Yes. Llama.cpp + Qwen3.6-35b (MTP) + OpenCode is quite capable and runs on a single RTX 3090 and is faster than most cloud models. Quality is like running edge models from 8-12 months ago. Setup details at https://github.com/pierotofy/LocalCodingLLM/

▲

jacobgold 5 hours ago | parent | next [-]

"Quality is like running edge models from 8-12 months ago."

That sounds great for hobbyists but IMHO it wasn't until Opus 4.6 was released six months go (Dec 25, 2025) that we had a model good enough for professionals to use as a primary driver of their coding agents. That seems to be the threshold worth aiming for.

▲

sbrother 4 hours ago | parent | next [-]

I strongly agree on that being the release where these tools got good enough to substantially speed up my professional work. I have to admit I was super skeptical of AI coding until then.

	▲	dnautics 4 hours ago \| parent [-]
		for me (might be because of the language im using) i had a substantial bump around september and a huge bump around January. in my stuff now i use an OT library that claude put finishing touches on in September.

▲

Projectiboga 4 hours ago | parent | prev | next [-]

So thalen it might be 6-8 months to get to useable on a local open model? Of course state of the art will be a year ahead, a generation at the current pace.

▲

pierotofy 5 hours ago | parent | prev [-]

I use it for work.

▲

jacobgold 5 hours ago | parent [-]

That's cool if you prefer it, but it is hard to imagine it being a strictly rational choice when much better quality is available at a price that is small relative to the cost of an employee. Or is there something specific about your use-case?

▲

vector_spaces 4 hours ago | parent | next [-]

Not all work requires every facet to be so sharply optimized, and there may be other constraints that are completely invisible to you. Some that were easy for me to imagine: the parent works in a heavily regulated industry, their IT team is slow-moving and paranoid and this is a safe, under-the-radar workaround, the output is "good enough" for their purposes and they find tinkering with it to be fun.

Regardless I don't think it's fruitful to be so condescending with such little insight into this person's situation. Even if you had total insight -- let people be and withhold your judgement, or at least keep it to yourself. Making people feel stupid is a great way to turn people off to pretty much anything else you have to say

▲

lokar 4 hours ago | parent | prev | next [-]

Won’t it depend on what you use it for? A less capable system might be fine for boilerplate, moderate re-factoring, etc. Not everyone is building whole features in one go.

▲

pierotofy 4 hours ago | parent | prev [-]

To me, what's not rational is believing you must rent the tools of your trade while exposing all of your employer's intellectual property to a third party. Difference of opinion.

	▲	jacobgold 4 hours ago \| parent [-]
		It's not my opinion that you "must" rent tools but it certainly is the pragmatic choice in 2026. I would be as happy as anyone for this situation to change and I expect it to at some point.

▲

trueno 5 hours ago | parent | prev | next [-]

i have a 128gb m4 max macbook pro i've been wanting to tinker with this stuff but genuinely never find the time. any mac users in here running similar to the above that can share their experience?

i always see great debates with local stuff but the space is constantly moving goalposts and all the vernacular is pretty unfamiliar to me. i'd love to understand what people with objective experience feel they've traded away (or gained) when going local so i can determine for myself if these things are a good fit.

▲

brycesub 4 hours ago | parent | next [-]

If you have a 128GB Mac you really ought to try out: https://github.com/antirez/ds4 by the creator of redis. This is probably as close to it gets to state-of-the-art local LLM + agentic coding.

	▲	__mharrison__ 2 hours ago \| parent \| next [-]
		Using this just this morning on my DGX Spark. A little slower than frontier models but my $200/mo weekly usage exhausted with 3 days left on the week... (Shouldn't have done that refactoring job in high mode)
	▲	trueno 2 hours ago \| parent \| prev \| next [-]
		well this is supremely interesting thanks for putting it on my radar
	▲	lostlogin 4 hours ago \| parent \| prev [-]
		Thank you.

▲

htrp 4 hours ago | parent | prev | next [-]

Use your ClaudeCode sub and tell it to set it up for you

▲

dirkolbrich an hour ago | parent | prev [-]

I have the same machine. You might look into https://omlx.ai/ a „macOS-native MLX server“. pi.dev for the agent with MCP, web-search and sub-agents extension.

▲

atomicnumber3 5 hours ago | parent | prev | next [-]

Same. I have no desire to use Claude at all anymore.

▲

pierotofy 5 hours ago | parent [-]

Yep. Screw Anthropic, CloseAI and all other rent seekers in this space.

▲

akulbe 4 hours ago | parent [-]

I have an M2 Max MBP with 96GB of RAM. What models and setup would you use for this kind of configuration?

	▲	monirmamoun 2 hours ago \| parent [-]
		download LM Studio to play with, and it will let you search for models... try Qwen3.6-35B-A3B at 4,5 or 6 bits (6 bit XL is near perfect) and use pi coder or another harness to access it... you can also try Unsloth studio and try same model to start. LM Studio slighter easier to use, Unsloth probably better quality. Neither one is super great quality by the way (meaning: they crash or act weirdly too often to be full production solutions, but can work for local coding). ONCE YOU DOWNLOAD EITHER APP... it will let you search huggingface for the models. Just type qwen to start looking and ... start messing around. And you connect the pi coder harness using the http interface that LM Studio and Unsloth offer to the engine API, so make sure you figure out that url and turn it on... something like 127.0.0.1:1234/api would be a typical IP (localhost) and port (1234 is used by LM Studio)

▲

daveidol 4 hours ago | parent | prev | next [-]

Do you do your dev work on the windows machine (referenced in the docs), or do you remotely access it from a separate machine? I ask because I have a RTX 3090 kicking around in a gaming desktop, but I don't use it for any dev work (I use a Macbook Pro).

	▲	snake_n_my_boot 2 hours ago \| parent [-]
		I have a similar set up and have been using it to learn and tinker with open models. I run Ollama on the gaming desktop and point OpenCode to it from my MacBook. Works nicely for me so far.

▲

lelandbatey 5 hours ago | parent | prev | next [-]

I use it, it's good, I get work done, but know that they really mean it when they say

> "Quality is like running edge models from 8-12 months ago"

Don't expect Opus, expect more like Haiku. If you micromanage it, you'll get great results. If you want it to be a human in a box, it'll flounder.

▲

dheera 5 hours ago | parent | prev | next [-]

Am I doing something wrong or has ollama become shittified?

I'm looking at https://ollama.com/search and the top few models like kimi-k2.7-code say "cloud" and I can't seem to ollama pull them.

I thought the whole POINT of ollama was not-cloud?

	▲	hoherd 4 hours ago \| parent \| next [-]
		I experienced the same situation a month or two ago. One of my friends sent me this article that was illuminating. https://sleepingrobots.com/dreams/stop-using-ollama/
	▲	satvikpendem 5 hours ago \| parent \| prev \| next [-]
		Ollama is not recommended to be used. Use llama.cpp.
	▲	jmorgan 4 hours ago \| parent \| prev \| next [-]
		The larger models are available on Ollama's cloud as most folks don't have the hardware to run 500B-1T parameter models.
	▲	jubilanti 3 hours ago \| parent \| prev \| next [-]
		> I thought the whole POINT of ollama was not-cloud? It was at first, then the developers realized they had a massive userbase they could monetize. A tale as old as open source...
	▲	toyg 4 hours ago \| parent \| prev [-]
		Yes, you've nailed it. Ollama are desperately trying to pull a Cursor - like 3791 other projects in this space.

▲

dominotw 5 hours ago | parent | prev [-]

how much does the setup cost if i want to buy all the hardware now and increased power costs?