|
| ▲ | aszen a day ago | parent | next [-] |
| I don't think so, model improvements far outweigh any harness or tooling. Look at https://github.com/SWE-agent/mini-swe-agent for proof |
| |
| ▲ | prodigycorp a day ago | parent [-] | | Yes but people aren’t choosing CC because they are necessarily performance maximalists. They choose it because it has features that make it behave much more nicely as a pair programming assistant than mini-swe-agent. There’s a reason Cursor poached Boris Cherney and Cat Wu and Anthropic hired them back! | | |
| ▲ | aszen a day ago | parent [-] | | They nailed down the UX I would say and the models themselves are a lot better even outside of CC | | |
| ▲ | prodigycorp a day ago | parent [-] | | I don’t think I disagree with you about anything, I’m trying to split hairs at this point. |
|
|
|
|
| ▲ | rfw300 a day ago | parent | prev | next [-] |
| Any person who would choose 3.7 with a fancy harness has a very poor memory about how dramatically the model capabilities have improved between then and now. |
| |
| ▲ | prodigycorp a day ago | parent [-] | | I’d be very interested in the performance of 3.7 decked out with web search, context7, a full suite of skills, and code quality hooks against opus 4.5 with none of those. I suspect it’s closer than you think! | | |
| ▲ | CuriouslyC a day ago | parent | next [-] | | Skills don't make any difference above having markdown files to point an agent to with instructions as needed. Context7 isn't any better than telling your agent to use trafilatura to scrape web docs for your libs, and having a linting/static analysis suite isn't a harness thing. 3.7 was kinda dumb, it was good at vibe UIs but really bad at a lot of things and it would lie and hack rewards a LOT. The difference with Opus 4.5 is that when you go off the Claude happy path, it holds together pretty well. With Sonnet (particularly <=4) if you went off the happy path things got bad in a hurry. | | |
| ▲ | prodigycorp a day ago | parent [-] | | Yeah. 3.7 was pretty bad. I remember its warts vividly. It wanted to refactor everything. Not a great model on which to hinge this provocation. But skills do improve model performance, OpenAI posted some examples of how it massively juiced up their results on some benchmarks. |
| |
| ▲ | nl a day ago | parent | prev [-] | | > I suspect it’s closer than you think! It's not. I've done this (although not with all these tools). For a reasonable sized project it's easy to tell the difference in quality between say Grok-4.1-Fast (30 on AA Coding Index) and Sonnet 4.5 (37 on AA). Sonnet 3.7 scores 27. No way I'm touching that. Opus 4.5 scores 46 and it's easy to see that difference. Give the models something with high cyclomtric complexity or complex dependency chains and Grok-4.1-Fast falls to bits, Opus 4.5 solves things. |
|
|
|
| ▲ | nl a day ago | parent | prev [-] |
| This is SO wrong. I actually wrote my own simple agent (with some twists) in part so I could compare models. Opus 4.5 is in a completely different league to Sonnet 4.5, and 3.7 isn't even on the same planet. I happily use my agent with Opus but there is no world in which I'd use a Sonnet 3.7 level model for anything beyond simple code completion. |