Do more planning yourself, be smart about the context, break down tasks into smaller components, give it more guidance. You can't just lazily prompt it to complete large features autonomously and expect good results.

▲

aniceperson 20 minutes ago | parent | next [-]

a good harness is supposed to do what you are describing. sonnet on pi.dev is pretty terrible but fast. Claude Code has ridiculous amounts of prompt engineering at system prompt level and sub session spawing combined with low temperature, to provide the predictable results people like. CC screws up and you never see, because the harness auto corrects, while on OSS you see everything, and does not comes with the level of monitoring by default.

▲

amilios 2 hours ago | parent | prev | next [-]

But if the closed-source models can do this without the additional effort, that's a significant gap, no?

▲

10000truths 2 hours ago | parent | next [-]

The point is that the price gap is so much larger than the capability gap, that even with the extra compute needed to make up for the lack of capability, you can still come out ahead in terms of amortized $/work done.

▲

flexagoon an hour ago | parent | prev | next [-]

Is it really when they are hundreds of times more expensive?

▲

eikenberry an hour ago | parent | prev | next [-]

That is the 3-6 month sota-open gap people talk about, a time-window that continues to move as new models are released on both sides.

▲

bigfishrunning 2 hours ago | parent | prev [-]

See that's the thing, they can't. Every model needs hand holding and guidance.

	▲	amilios an hour ago \| parent [-]
		some require less hand-holding than others though

▲

eikenberry 2 hours ago | parent | prev [-]

+1 .. just wanted to reiterate that this is the answer. The open models work great if you just do a little more of the design/architectural work up front and organize your work appropriately.