For coding assistance, I have tried OpenCode with several large open models through OpenRouter. All were fairly bad compared to Claude Opus. Could you provide some hints on how I should be holding these open models so that I might get more value out of them?

I agree with the common trope that open models lag behind by about a year, but something magical happened just around a year ago when the state of the art models became extremely useful. By this reasoning we're about to see open models perform well, but I'm afraid there is more to it than just waiting for another revolution around the sun.

Note, my application is coding assistance. Open models can be great for other purposes.

▲

tariky 2 hours ago | parent | next [-]

I tried almost all OS models on opencode, none of them is on levels as opus 4.7.

In latest experiment I used opus for implementation plan then used cursor composer 2.5 for execution.

I must say that combo is really good. Main drawback of claude code is that is super slow. So when paired with composer that is super fast it flies.

	▲	cainxinth 2 hours ago \| parent [-]
		No one is claiming that OS is as good. They are saying it isn't that far behind SOTA commercial products. So why pay exorbitantly just to get something only a few percent better than the free option? But there have been very good open source office apps for decades and few enterprises use them, so perhaps this is just the nature of B2B purchasing committees and 'nobody getting fired for buying IBM.'

▲

slopinthebag 2 hours ago | parent | prev [-]

Do more planning yourself, be smart about the context, break down tasks into smaller components, give it more guidance. You can't just lazily prompt it to complete large features autonomously and expect good results.

▲

aniceperson 20 minutes ago | parent | next [-]

a good harness is supposed to do what you are describing. sonnet on pi.dev is pretty terrible but fast. Claude Code has ridiculous amounts of prompt engineering at system prompt level and sub session spawing combined with low temperature, to provide the predictable results people like. CC screws up and you never see, because the harness auto corrects, while on OSS you see everything, and does not comes with the level of monitoring by default.

▲

amilios 2 hours ago | parent | prev | next [-]

But if the closed-source models can do this without the additional effort, that's a significant gap, no?

▲

10000truths 2 hours ago | parent | next [-]

The point is that the price gap is so much larger than the capability gap, that even with the extra compute needed to make up for the lack of capability, you can still come out ahead in terms of amortized $/work done.

▲

flexagoon an hour ago | parent | prev | next [-]

Is it really when they are hundreds of times more expensive?

▲

eikenberry an hour ago | parent | prev | next [-]

That is the 3-6 month sota-open gap people talk about, a time-window that continues to move as new models are released on both sides.

▲

bigfishrunning 2 hours ago | parent | prev [-]

See that's the thing, they can't. Every model needs hand holding and guidance.

	▲	amilios an hour ago \| parent [-]
		some require less hand-holding than others though

▲

eikenberry 2 hours ago | parent | prev [-]

+1 .. just wanted to reiterate that this is the answer. The open models work great if you just do a little more of the design/architectural work up front and organize your work appropriately.