> In a year or so

Look at the best models from Spring 2025, and compare with now (and similarly for Springs 2024 and 2025). Armstrong and lots of others are betting that this trend will continue, and if it does, the LLMs will ship code the LLMs understand, and whether any human specifically understands any particular part will mostly not matter.

▲

hn_throwaway_99 an hour ago | parent | next [-]

> the LLMs will ship code the LLMs understand, and whether any human specifically understands any particular part will mostly not matter.

I find this particularly funny. There were more than a couple Star Trek Episodes where some alien planet depends on some advanced AI or other technology that they no longer understand, and it turns out the AI is actually slowly killing them, making them sterile, etc. (e.g. https://en.wikipedia.org/wiki/When_the_Bough_Breaks_(Star_Tr... )

Sure, Star Trek is fiction, but "humans rely on a technology that they forget how to make" is a pretty recurrent theme in human history. The FOGBANK saga was pretty recent: https://en.wikipedia.org/wiki/Fogbank

It just amazes me that people think "Sure, this AI generated code is kinda broken now, but all we need is just more AI code to fix it at some unknowable point in the future because humans won't be able to understand it!"

▲

pron 2 hours ago | parent | prev | next [-]

And if the trend doesn't continue? I understand that a company with Coinbase's performance has little to lose and not many options, but many companies are in a better position.

The problem is that executives could take the 15-20% productivity boost and be content, but they read stuff like this, get greedy, and they don't understand the risk they're taking.

▲

atonse 2 hours ago | parent | next [-]

Even if the trend doesn’t continue, the current models are very very good. They’re better than the average programmer in the industry, already.

	▲	zeroonetwothree an hour ago \| parent [-]
		Maybe at some coding benchmark. Certainly not at actually shipping and maintaining production grade software.

▲

randallsquared 2 hours ago | parent | prev [-]

Agreed! That will be an... "interesting" outcome, if so, for a lot of these companies.

▲

bix6 2 hours ago | parent | prev [-]

> and whether any human specifically understands any particular part will mostly not matter.

This is how I feel. It’s building things for me that work. I don’t care how it works under the hood in many cases.

▲

pron 2 hours ago | parent [-]

It's not about caring how it works. It's about caring that it keeps working at all even after you add stuff to it for a year or three (and nearly all software written by companies is software they evolve).

▲

bix6 2 hours ago | parent [-]

And who’s to say it won’t? It’s working now. I’m adding stuff and it’s still working. Why won’t that continue in year 3?

▲

pron 2 hours ago | parent | next [-]

If you carefully read the agent's output you'll see why. It adds layers upon layers of workarounds and defences that hide serious problems, until the codebase reaches a point where the agent can no longer understand it and work with it. All the tests pass right up until the moment when adding a feature or fixing a bug causes another bug, and then nothing and no one can save the codebase anymore.

▲

qingcharles an hour ago | parent [-]

Maybe a year ago? Right now the LLMs I mainly use (GPT5.5, Opus 4.7) will intuit exactly what I need from my brief specs and universally go above-and-beyond in creating code that is not only extremely high-quality, but catches a ton of the gotchas I would have stumbled on, in advance.

Just a minute ago 5.5 looked at some human-written code of mine from last year and while it was making the changes I asked for it determined the existing code was too brittle (it was) and rewrote it better. It didn't mention this in its summary at the end, I only know because I often watch the thinking output as it goes past before it hides it all behind a pop-open.

	▲	s__s 26 minutes ago \| parent [-]
		Interesting that we’ve have such different experiences. I was working with both those models today and on several occasions it proposed some pretty poor solutions. I also find I need to run an llm code review or two against any code it produces to even get to the point where’s it’s ready for human review. In any case they served as an extremely valuable tool.

▲

titularcomment an hour ago | parent | prev | next [-]

Maintaining software is like 80% of the job.

▲

techblueberry 2 hours ago | parent | prev [-]

Because the API’s it uses will change? Nothing in tech is static. And that’s just going to get worse re: this whole AI thing.