I think the opposite will happen - leadership will forego this attitude of "reverse course on the first outage".

Teams will figure out how to mitigate such situations in future without sacrificing the potential upside of "fully autonomous code changes made on production systems" (e.g invest more in a production-like env for test coverage).

Software engineering purists have to get out of some of these religious beliefs

▲

verdverm 7 hours ago | parent | next [-]

> Software engineering purists have to get out of some of these religious belief

To me, the Claude superfans like yourself are the religious, like how you run around poffering unsubstantiated claims like this and believe in / anthropomorphize way too much. Is it because Anthrop'ic is an abbreviation of Anthropomorphic?

▲

blutoot 6 hours ago | parent | next [-]

I would be in the skeptics' camp 3-4 months ago. Opus-4.5 and GPT-5.2 have changed my mind. I'm not talking about mere code completion. I am talking about these models AND the corresponding agents playing a really really capable software engineer + tester + SRE/Ops role.

The caveat is that we have to be fairly good at steering them in the right direction, as things stand today. It is exhaustive to do it the right way.

	▲	verdverm 5 hours ago \| parent [-]
		I agree the latest Gen of models, Opus 4.5 and Gemini 3 are more capable. 5.2 is OpenAI squeezing as much as they can out of 4 because they haven't had a successful pre training run since Ilya left I disagree that they are really really capable engineers et al. They have moments where they shine like one. They also have moments where they perform worse than a new grad/hire. This is not what a really really capable engineer looks like. I don't see this fundamental changing, even with all the improvements we are seeing. It's lower level and more core than something adding more layers on top can resolve, that a only addresses best it can

▲

throwaway7783 6 hours ago | parent | prev [-]

In my own anecdotal experience Claude Code found a bug in production faster than I could. I was the author of the said code, that was written 4 years ago by hand. GPs claim perhaps is not all that unsubstantiated. My role is moving more towards QA/PM nowadays.

▲

verdverm 6 hours ago | parent [-]

I have many wins with Ai, I also have many fail hards. This experience helps me understand where their limits are

Do you have fail hards to share along with your wins? Are we going to only share our wins like stonk hussies?

	▲	throwaway7783 5 hours ago \| parent [-]
		For sure. Not hard fails, but bad fixes. It confidently thought it fixed a bug, but it really didn't. I could only tell (it was fairly complex), because I tried reproducing it before/after. Ultimately I believe there was not sufficient context provided to it. It has certainly failed to do what I asked it to do in round 1, round 2, but eventually got it right (a rendering issue for a barcode designer). These incidents have been less and less over the last year - switching it Opus made failure frequencies less. Same thing for code reviews. Most of it is fluff, but it does give useful feedback, if the instructions are good. For example, I asked for a blind code review of a PR ("Review this PR"), and it gave some generic commentary. I made the prompt more specific ("Follow the API changes across modules and see impact") - it found a serious bug. The number of times I had to give up in frustration has been going down over the last one year. So I tend believe a swarm of agents could do a decent job of autonomous development/maintenance over the next few years.

▲

majormajor 2 hours ago | parent | prev | next [-]

Leadership will do what customers demand, which in most cases won't be ship-constantly-and-just-mitigate.

How to find problems through testing before they happen is a decades-long unsolved problem, sadly.

▲

jauntywundrkind 4 hours ago | parent | prev [-]

Even lesser agents are incredibly good and incredibly fast using tools to inspect the system & come up with ideas for things to check, and checking them. I absolutely agree: we will 100% give the agents far more power. A browser, a debugger for the server that works with that browser instance, a database tool, a opentelemtry tool.

The teams are going to figure out how to mitigate bad deploys by using even more AI & giving it even better information gathering.