Obviously modern harnesses have better features but I wouldn't say it invalidates the mental model. Simpler agents aren't that far behind in performance if the underlying model is the same, including very minimal ones with basic tools.

I'd say it's similar to how a "make your own relational DB" article might feature a basic B-tree with merge-joins. Yeah, obviously real engines have sophisticated planners, multiple join methods, bloom filters, etc., but the underlying mental model is still accurate.

▲

prodigycorp a day ago | parent [-]

You’re not wrong but I still think that the harness matters a lot when trying to accurately describe Claude Code.

Here’s a reframing:

If you asked people “what would you rather work with, today’s Claude Code harness with sonnet 3.7, or the 200 line agentic loop in the article with Opus 4.5, which would you choose?”

I suspect many people would choose 3.7 with the harness. Moreover, that is true, then I’d say the article is no longer useful for a modern understanding of Claude Code.

▲

aszen a day ago | parent | next [-]

I don't think so, model improvements far outweigh any harness or tooling.

Look at https://github.com/SWE-agent/mini-swe-agent for proof

▲

prodigycorp a day ago | parent [-]

Yes but people aren’t choosing CC because they are necessarily performance maximalists. They choose it because it has features that make it behave much more nicely as a pair programming assistant than mini-swe-agent.

There’s a reason Cursor poached Boris Cherney and Cat Wu and Anthropic hired them back!

▲

aszen a day ago | parent [-]

They nailed down the UX I would say and the models themselves are a lot better even outside of CC

	▲	prodigycorp a day ago \| parent [-]
		I don’t think I disagree with you about anything, I’m trying to split hairs at this point.

▲

rfw300 a day ago | parent | prev | next [-]

Any person who would choose 3.7 with a fancy harness has a very poor memory about how dramatically the model capabilities have improved between then and now.

▲

prodigycorp a day ago | parent [-]

I’d be very interested in the performance of 3.7 decked out with web search, context7, a full suite of skills, and code quality hooks against opus 4.5 with none of those. I suspect it’s closer than you think!

▲

CuriouslyC a day ago | parent | next [-]

Skills don't make any difference above having markdown files to point an agent to with instructions as needed. Context7 isn't any better than telling your agent to use trafilatura to scrape web docs for your libs, and having a linting/static analysis suite isn't a harness thing.

3.7 was kinda dumb, it was good at vibe UIs but really bad at a lot of things and it would lie and hack rewards a LOT. The difference with Opus 4.5 is that when you go off the Claude happy path, it holds together pretty well. With Sonnet (particularly <=4) if you went off the happy path things got bad in a hurry.

	▲	prodigycorp a day ago \| parent [-]
		Yeah. 3.7 was pretty bad. I remember its warts vividly. It wanted to refactor everything. Not a great model on which to hinge this provocation. But skills do improve model performance, OpenAI posted some examples of how it massively juiced up their results on some benchmarks.

▲

nl a day ago | parent | prev [-]

> I suspect it’s closer than you think!

It's not.

I've done this (although not with all these tools).

For a reasonable sized project it's easy to tell the difference in quality between say Grok-4.1-Fast (30 on AA Coding Index) and Sonnet 4.5 (37 on AA).

Sonnet 3.7 scores 27. No way I'm touching that.

Opus 4.5 scores 46 and it's easy to see that difference. Give the models something with high cyclomtric complexity or complex dependency chains and Grok-4.1-Fast falls to bits, Opus 4.5 solves things.

▲

nl a day ago | parent | prev [-]

This is SO wrong.

I actually wrote my own simple agent (with some twists) in part so I could compare models.

Opus 4.5 is in a completely different league to Sonnet 4.5, and 3.7 isn't even on the same planet.

I happily use my agent with Opus but there is no world in which I'd use a Sonnet 3.7 level model for anything beyond simple code completion.