>Your top coder has guard rails in place to prevent him autonomously going free - right?

The parent is implying they would prefer an AI when working in the airline and health industry because it makes less errors. Read the comment again.

They have not said, "Hey, I work in the airline and health industry and I'd love to use AI for a couple of the bullshit IT UIs we have as long as we can put guardrails on the AI to stay in its lane."

I asked a yes or no question. The guardrails you can put to mitigate errors are the same guardrails pre-AI for the humans (tests, regressions, reviews). If you were wary of employing a top lead engineer with verifiable dementia prior to AI for a mission critical system, logic implies you should think twice giving that much responsibility to an AI as well.

> The hallucination thing I think is mostly overblown

Can you predict when and how the SOTA model will hallucinate? Yes or no. Can you predict the severity impact of that error beforehand? Yes or no.

>from speaking to colleagues it seems to vary wildly depending on which model and harness you are using

You have partially answered my question it would seem.

▲

deanc 2 hours ago | parent [-]

> Can you predict when and how the SOTA model will hallucinate? Yes or no. Can you predict the severity impact of that error beforehand? Yes or no.

No, but the same can be said for your colleagues. You might call what the LLM does hallucinations, I'd call them mistakes. I think we have totally forgotten that humans make them all the time and are confidently wrong too.

Your original question, doesn't really get to the bottom of the point I'm trying to make, and I don't really feel it fairly represents the issue we are talking about here. They are not the same things.

▲

suttontom 39 minutes ago | parent | next [-]

This is such a tired, meaningless argument. I've never seen a human in 10 years of professional software engineering at a large company ever so confidently, consistently create and send out seemingly well-reasoned code that's as wrong as what SOTA models using CC or Codex do. If a human did this, they would be fired or perpetually remain a junior who no one wants to work with.

Also, if a human does this, you can replace them and get a human who will not do it. The default for an LLM is to generate plausible-looking text that may or may not be completely incoherent. That is not the default for a human. Again, if you find that your colleague consistently fabricates APIs, you can hire someone who isn't crazy instead, but you cannot do the same with LLMs.

▲

sillyfluke 2 hours ago | parent | prev [-]

>No, but the same can be said for your colleagues.

That's absolutely false. My collegues don't routinely and confidently invent apis that are not there, or spectacularly and repeatedly misunderstand the purpose of certain functions or exhibit extreme forgetfullness. Especially when I've warned them. Hallucinations and confabulations in otherwise healthy individuals are mental disorders. When I ask them why they made an certain kind of error, I can expect to get a reasonable answer. No one has uttered the phrase "Bob hallucinated again while writing those tests" when the Bob in question is a human.

▲

deanc an hour ago | parent [-]

Well, your experience doesn't align with mine. I have been using, and in part of an organisation that is extensively using, Claude with Opus for everything for about 3 months now and I am not experiencing the problems you describe. We'll have to agree to disagree here.

	▲	sillyfluke an hour ago \| parent [-]
		That is fine. "Your experience may vary" is the crux of my argument amusingly. You can't have just realized that people are having different experiences using AI, or even that the same person has different experiences when they change domains or technical contexts. There's been lots of comments littered on this forum to that effect. Calling hallucinations simply mistakes does not seem to me to be a healthy way to reason about LLMs. I can ask a collegue how well they can program in Ada and adjust my expectations on productivity and bug rates. I can't ask an LLM how well they can code in Ada (just a throwaway example), or even how much of Ada was in its training data. I have to actually spend money and spend time code reviewing before I can even formulate any expectations at all.