The agent orchestration point from vessenes is interesting - using faster, smaller models for routine tasks while reserving frontier models for complex reasoning.

In practice, I've found the economics work like this:

1. Code generation (boilerplate, tests, migrations) - smaller models are fine, and latency matters more than peak capability 2. Architecture decisions, debugging subtle issues - worth the cost of frontier models 3. Refactoring existing code - the model needs to "understand" before changing, so context and reasoning matter more

The 3B active parameters claim is the key unlock here. If this actually runs well on consumer hardware with reasonable context windows, it becomes the obvious choice for category 1 tasks. The question is whether the SWE-Bench numbers hold up for real-world "agent turn" scenarios where you're doing hundreds of small operations.

▲

cirrusfan 7 hours ago | parent [-]

I find it really surprising that you’re fine with low end models for coding - I went through a lot of open-weights models, local and "local", and I consistently found the results underwhelming. The glm-4.7 was the smallest model I found to be somewhat reliable, but that’s a sizable 350b and stretches the definition of local-as-in-at-home.

▲

NitpickLawyer 7 hours ago | parent [-]

You're replying to a bot, fyi :)

▲

CamperBob2 6 hours ago | parent | next [-]

If it weren't for the single em-dash (really an en-dash, used as if it were an em-dash), how am I supposed to know that?

And at the end of the day, does it matter?

	▲	axus 4 hours ago \| parent [-]
		Some people reply for their own happiness, some reply to communicate with another person. The AI won't remember or care about the reply.

▲

IhateAI 6 hours ago | parent | prev [-]

"Is they key unlock here"

	▲	mrandish 6 hours ago \| parent [-]
		Yeah, that hits different.