Yeah, I don't really accept the argument that AI makes mistakes and therefore cannot be trusted to write production code (in general, at least - obviously depends on the types of mistakes, which code, etc.).

The reality is we have built complex organizational structures around the fact that humans also make mistakes, and there's no real reason you can't use the same structures for AI. You have someone write the code, then someone does code review, then someone QAs it.

Even after it goes out to production, you have a customer support team and a process for them to file bug tickets. You have customer success managers to smooth over the relationships with things go wrong. In really bad cases, you've got the CEO getting on a plane to go take the important customer out for drinks.

I've worked at startups that made a conscious decision to choose speed of development over quality. Whether or not it was the right decision is arguable, but the reality is they did so knowing that meant customers would encounter bugs. A couple of those startups are valuable at multiple billions of dollars now. Bugs just aren't the end of the world (again, most cases - I worked on B2B SaaS, not medical devices or what have you).

▲

Fishkins 3 hours ago | parent | next [-]

> humans also make mistakes

This is broadly true, but not comparable when you get into any detail. The mistakes current frontier models make are more frequent, more confident, less predictable, and much less consistent than mistakes from any human I'd work with.

IME, all of the QA measures you mention are more difficult and less reliable than understanding things properly and writing correct code from the beginning. For critical production systems, mediocre code has significant negative value to me compared to a fresh start.

There are plenty of net-positive uses for AI. Throwaway prototyping, certain boilerplate migration tasks, or anything that you can easily add automated deterministic checks for that fully covers all of the behavior you care about. Most production systems are complicated enough that those QA techniques are insufficient to determine the code has the properties you need.

	▲	bdangubic 3 hours ago \| parent [-]
		> The mistakes current frontier models make are more frequent, more confident, less predictable, and much less consistent than mistakes from any human I'd work with. my experience literal 180 degrees from this statement. and you don’t normally get the choose humans you work with, some you may be involved in the interview process but that doesn’t tell you much. I have seen so much human-written code in my career that, in the right hands, I’ll take (especially latest frontier) LLM written code over average human code any day of the week and twice on Sunday

▲

bigstrat2003 an hour ago | parent | prev [-]

Humans also make mistakes, but unlike LLMs, they are capable of learning from their mistake and will not repeat it once they have learned. That, not the capacity to make mistakes, is why you should not allow LLMs to do things.