Have you personally used any of the latest batch of even smaller local models? They certainly don't beat SotA models at coding... but with a good harness they are able to achieve things with SotA that I couldn't last year.

I've repeatedly given local models non-trivial projects that involve research and coding which they've successfully completed with minimal intervention from me (almost exclusively in the domain of reviewing the results). Again, nothing comparable with current SotA, but definitely tasks I could not have given SotA models last year (without agent harness).

Now that pure progress from these models seems to have slowed down, we're seeing a ton of options for both making models more efficient and other tools that help improve them (everything from agent harnesses to RLVR).

That's just looking at "what can small do today", when you look at what's possible with larger open models that are still much smaller than SotA from the major providers, their performance is extremely close to SotA, enough that for personal projects I'll just use Kimi instead of any anthropic offerings.

So it's not terribly hard to image a solution in the middle happening within a few years. We still have tons to learn about optimal sizes of these models and how to build them with maximal efficiency (and we've already seen a lot of recent improvements in this space).

▲

maccard 4 hours ago | parent | next [-]

> but with a good harness they are able to achieve things with SotA that I couldn't last year.

What happens if you run last years model in a SOTA harness? IME, the quality of the harness has a much more significant impact on the quality of the result, once you get past the initial hump of “can it do anything at all”

	▲	windexh8er 3 hours ago \| parent \| next [-]
		I think this is a big component, but also context. A large factor in any model being able to handle complexity comes down to context length. I think multiple SLMs driven by an orchestration frameworks (harness or otherwise) will ultimately displace LLMs. Right now we're in the era of diminishing returns with respect to LLM gains. Moving the needle percentages doesn't excite as many people anymore and with "reasoning" capabilities there's no reason why small distributed models can't be run more efficiently, especially if/when we start to see gains in modularized context management solutions.
	▲	mswphd 2 hours ago \| parent \| prev [-]
		sure, but high-quality harnesses require less gpu compute/VRAM, and plausibly can be used locally by most users.

▲

trees101 40 minutes ago | parent | prev | next [-]

can you please share details about your harness

▲

sixothree 4 hours ago | parent | prev [-]

Can you spare a sentence or two describing your local setup?

	▲	theplatman 3 hours ago \| parent [-]
		biggest thing i wish was present in more discussions about models is people providing more specifics on their setups vs. vague descriptions of harnesses