Remix.run Logo
mjr00 9 hours ago

> most orgs are used to responding to a daytime alert by calling out, “Who just shipped that change?” assuming that whoever merged the diff surely understands how it works and can fix it post-haste. What happens when nobody wrote the code you just deployed, and nobody really understands it?

I assume the first time this happens at any given company will be the moment they realize fully autonomous code changes made on production systems by agents is a terrible idea and every change needs a human to take responsibility for and ownership of it, even if the changes were written by an LLM.

0xbadcafebee 3 hours ago | parent | next [-]

> every change needs a human to take responsibility for and ownership of it, even if the changes were written by an LLM

Actually it could be the opposite: they hold the LLM responsible. When the code change breaks production they'll just ask the LLM to fix it. If it can't? "Not my fault, the LLM wrote it not me! We just need to improve our prompting next time!" Never underestimate humans' capacity to avoid doing work.

hippo22 9 hours ago | parent | prev | next [-]

What happens if the person who wrote the code went on vacation? What happens if the code is many years old and no current team member has touched the code?

Understanding code you didn't personally write is part of the job.

solid_fuel 2 hours ago | parent [-]

I agree that understanding legacy code and code by other people is part of the job, but I don't see how these points are related.

> What happens if the person who wrote the code went on vacation?

They get yelled at, because shipping code at 5 pm on Friday and then leaving for vacation is typically considered a "dick move".

> What happens if the code is many years old and no current team member has touched the code?

Then the issue probably isn't caused by a recent deployment?

blutoot 9 hours ago | parent | prev | next [-]

I think the opposite will happen - leadership will forego this attitude of "reverse course on the first outage".

Teams will figure out how to mitigate such situations in future without sacrificing the potential upside of "fully autonomous code changes made on production systems" (e.g invest more in a production-like env for test coverage).

Software engineering purists have to get out of some of these religious beliefs

verdverm 7 hours ago | parent | next [-]

> Software engineering purists have to get out of some of these religious belief

To me, the Claude superfans like yourself are the religious, like how you run around poffering unsubstantiated claims like this and believe in / anthropomorphize way too much. Is it because Anthrop'ic is an abbreviation of Anthropomorphic?

blutoot 6 hours ago | parent | next [-]

I would be in the skeptics' camp 3-4 months ago. Opus-4.5 and GPT-5.2 have changed my mind. I'm not talking about mere code completion. I am talking about these models AND the corresponding agents playing a really really capable software engineer + tester + SRE/Ops role.

The caveat is that we have to be fairly good at steering them in the right direction, as things stand today. It is exhaustive to do it the right way.

verdverm 5 hours ago | parent [-]

I agree the latest Gen of models, Opus 4.5 and Gemini 3 are more capable. 5.2 is OpenAI squeezing as much as they can out of 4 because they haven't had a successful pre training run since Ilya left

I disagree that they are really really capable engineers et al. They have moments where they shine like one. They also have moments where they perform worse than a new grad/hire. This is not what a really really capable engineer looks like. I don't see this fundamental changing, even with all the improvements we are seeing. It's lower level and more core than something adding more layers on top can resolve, that a only addresses best it can

throwaway7783 6 hours ago | parent | prev [-]

In my own anecdotal experience Claude Code found a bug in production faster than I could. I was the author of the said code, that was written 4 years ago by hand. GPs claim perhaps is not all that unsubstantiated. My role is moving more towards QA/PM nowadays.

verdverm 6 hours ago | parent [-]

I have many wins with Ai, I also have many fail hards. This experience helps me understand where their limits are

Do you have fail hards to share along with your wins? Are we going to only share our wins like stonk hussies?

throwaway7783 5 hours ago | parent [-]

For sure. Not hard fails, but bad fixes. It confidently thought it fixed a bug, but it really didn't. I could only tell (it was fairly complex), because I tried reproducing it before/after. Ultimately I believe there was not sufficient context provided to it. It has certainly failed to do what I asked it to do in round 1, round 2, but eventually got it right (a rendering issue for a barcode designer).

These incidents have been less and less over the last year - switching it Opus made failure frequencies less. Same thing for code reviews. Most of it is fluff, but it does give useful feedback, if the instructions are good. For example, I asked for a blind code review of a PR ("Review this PR"), and it gave some generic commentary. I made the prompt more specific ("Follow the API changes across modules and see impact") - it found a serious bug.

The number of times I had to give up in frustration has been going down over the last one year. So I tend believe a swarm of agents could do a decent job of autonomous development/maintenance over the next few years.

majormajor 2 hours ago | parent | prev | next [-]

Leadership will do what customers demand, which in most cases won't be ship-constantly-and-just-mitigate.

How to find problems through testing before they happen is a decades-long unsolved problem, sadly.

jauntywundrkind 4 hours ago | parent | prev [-]

Even lesser agents are incredibly good and incredibly fast using tools to inspect the system & come up with ideas for things to check, and checking them. I absolutely agree: we will 100% give the agents far more power. A browser, a debugger for the server that works with that browser instance, a database tool, a opentelemtry tool.

The teams are going to figure out how to mitigate bad deploys by using even more AI & giving it even better information gathering.

tarxvf 9 hours ago | parent | prev [-]

If companies were generally capable of that level of awareness they would not operate the way that they do.