Opus 4.5 has become really capable.

Not in terms of knowledge. That was already phenomenal. But in its ability to act independently: to make decisions, collaborate with me to solve problems, ask follow-up questions, write plans and actually execute them.

You have to experience it yourself on your own real problems and over the course of days or weeks.

Every coding problem I was able to define clearly enough within the limits of the context window, the chatbot could solve and these weren’t easy. It wasn’t just about writing and testing code. It also involved reverse engineering and cracking encoding-related problems. The most impressive part was how actively it worked on problems in a tight feedback loop.

In the traditional sense, I haven’t really coded privately at all in recent weeks. Instead, I’ve been guiding and directing, having it write specifications, and then refining and improving them.

Curious how this will perform in complex, large production environments.

▲

s-macke 2 days ago | parent | next [-]

Just some examples I’ve already made public. More complex ones are in the pipeline. With [0], I’m trying to benchmark different coding-agents. With [1], I successfully reverse-engineered an old C64 game using Opus 4.5 only.

Yes, feel free to blame me for the fact that these aren’t very business-realistic.

[0] https://github.com/s-macke/coding-agent-benchmark

[1] https://github.com/s-macke/weltendaemmerung

▲

lelanthran 2 days ago | parent | prev | next [-]

> You have to experience it yourself on your own real problems and over the course of days or weeks.

How do you stop it from over-engineering everything?

▲

petcat 2 days ago | parent | next [-]

This has always been my problem whether it's Gemini, openai or Claude. Unless you hand-hold it to an extreme degree, it is going to build a mountain next to a molehill.

It may end up working, but the thing is going to convolute apis and abstractions and mix patterns basically everywhere

▲

jama211 2 days ago | parent | next [-]

Not in my experience - you need to build the fact that you don’t want it to do that into your design and specification.

▲

petcat 2 days ago | parent [-]

Sure, I can tell it not to do that, but it doesn't know what that is. It's a je ne sais quoi.

I can't teach it taste.

	▲	dflock 2 days ago \| parent [-]
		Recent Claude will just look at your code and copy what you've been doing, mostly, in an existing codebase - without being asked. In a new codebase, you can just ask it to "be conscice, keep it simple" or something.

▲

spaceman_2020 2 days ago | parent | prev [-]

It's very good at following instructions. You can build dedicated agents for different tasks (backend, API design, database design) and make it follow design and coding patterns.

It's verbose by default but a few hours of custom instructions and you can make it code just like anyone

	▲	svieira 2 days ago \| parent [-]
		> just like anyone Arthur Whitney? https://en.wikipedia.org/wiki/Arthur_Whitney_(computer_scien...

▲

s-macke 2 days ago | parent | prev | next [-]

Difficult and it really depends on the complexity. I definitely work in a spec-driven way, with a step-by-step implementation phase. If it goes the wrong way I prefer to rewrite the spec and throw away the code.

▲

ryanchants 2 days ago | parent | prev | next [-]

I have it propose several approaches, pick and choose from each, and remove what I don't want done. "Use the general structure of A, but use the validation structure of D. Using a view translation layer is too much, just rely on FastAPI/SQLModel's implicit view conversion."

	▲	dbbk 2 days ago \| parent [-]
		The Plan mode already does this, it makes multiple plans and then synthesises them

▲

bdangubic 2 days ago | parent | prev | next [-]

“Everything Should Be Made as Simple as Possible, But Not Simpler” should be the ending of every prompt :)

▲

verdverm 2 days ago | parent | prev | next [-]

Instructions, in the system prompt for not doing that

Once more people realize how easy it is to customize and personalized your agent, I hope they will move beyond what cookie cutter Big AI like Anthropic and Google give you.

I suspect most won't though because (1) it means you have to write human language, communication, and this weird form of persuasion, (2) ai is gonna make a bunch of them lazy and big AI sold them on magic solutions that require no effort on your part (not true, there is a lot of customizing and it has huge dividends)

▲

myvoiceismypass 2 days ago | parent | prev [-]

I personally try to narrow scope as much as possible to prevent this. If a human hands me a PR that is not digestible size-wise and content-wise (to me), I am not reviewing and merging it. Same thing with what claude generates with my guidance.

▲

jghn 2 days ago | parent | prev | next [-]

I find my sweet spot is using the Claude web app as a rubber duck as well as feeding it snippets of code and letting it help me refine the specific thing I'm doing.

When I use Claude Code I find that it *can* add a tremendous amount of ability due to its ability to see my entire codebase at once, but the issue is that if I'm doing something where seeing my entire codebase would help that it blasts through my quota too fast. And if I'm tightly scoping it, it's just as easy & faster for me to use the website.

Because of this I've shifted back to the website. I find that I get more done faster that way.

▲

pigpop 17 hours ago | parent | next [-]

I've had similar experiences but I've been able to start using Claude Code for larger projects by doing some refactoring with the goal of making the codebase understandable by just looking at the interfaces. This along with instructions to prefer looking at the interface for a module unless working directly on the implementation of the module seems to allow further progress to be made within session limits.

▲

fragmede 2 days ago | parent | prev [-]

By "the website" do you mean you're copy pasting, or are you using the code system where Anthropic clones your code from GitHub and interacts with it in a VM/container for you.

▲

jghn 2 days ago | parent [-]

Just pasting code snippets, and occasionally an entire file or two into the main claude.com site. I usually already know what I want and need, but just want to speed up the process on how to get there, and perhaps I missed something in the process.

	▲	zmmmmm 2 days ago \| parent [-]
		Aider is pretty good way to automate that. You can use it with Claude models. It lets you be completely precise down to a single file, and sit in chat/code/review loop - but it does a lot of the chores, like generating commit messages etc while saving you the copy paste effort.

▲

giancarlostoro 2 days ago | parent | prev | next [-]

> In the traditional sense, I haven’t really coded privately at all in recent weeks. Instead, I’ve been guiding and directing, having it write specifications, and then refining and improving them.

This is basically all my side projects.

▲

jesse_dot_id 2 days ago | parent | prev [-]

This has also been my experience.