The conceptual problem is that we keep wanting to compare AI behavior to that of traditional computers. The proper comparison is comparing AI, and how we trust or delegate to it, to the concept of delegating to other humans or even to domestic animal. Employees can be trained and given very specific skills and guidelines but still have agency and non-deterministic behavior. A seeing eye dog, a pack mule or chariot horse will often, but not necessarily always do what you ask of them. We've only been delegating to deterministic programmable machines for very short part of human history. But ad human societies, we've been collectively delegating a lot of useful activities to non-perfectly-dependable agents (ie each other) for a very long time. As as humans we've gotten done more that a few notable things in the last several millennia with this method. However, humans as delegates or as delegators have also done a lot of horrific things at scale to, both by accident or by design. And meanwhile (gestures broadly around everywhere) maybe humans actually aren't doing such an optimal job of running and governing everything important in the world?

When compared to how human make a mess of things like in the real world, how high does the bar really need to be for trusting AI agents. Even far shy from perfect, AI could still be a step function improvement over trusting ourselves.

▲

w10-1 12 minutes ago | parent | next [-]

Human delegation is disciplined as much by incentive alignment as instruction. The same is true for LLM's. The problem is that it's not possible to dominate intentions, LLM or human, because delegates/agents to be useful need autonomy.

The SOTA models are working on making them more capable and then adding guardrails for safety. It would be better to work on baking in incentive alignment, which probably means eliciting more incentive details from the LLM user. That's what I'd be working on at Apple, where the user might be induced to share a level of local-only details that could align the AI agents.

▲

_aavaa_ 14 minutes ago | parent | prev | next [-]

> how high does the bar really need to be for trusting AI agents.

You can hold a human responsible for that they do; you can reward them, fire them, sue them, etc.

You cannot do any of those things with an LLM. The threat of termination means nothing to an LLM.

▲

abdjdoeke an hour ago | parent | prev | next [-]

Well AI agents thinking capabilities are inspired by our own “neural networks.” AI makes the same mistakes we do it’s just called different things.

How many people say something like, “if I recall correctly.” This statement emphasizes that we think we know, but we’re just adding that disclaimer to protect ourselves from cancel culture.

People call that “Hallucination” when talking about an AI. It’s not hallucination, it’s beautiful imperfection.

▲

givemeethekeys 2 hours ago | parent | prev [-]

A very talented junior employee that you can't trust with the keys.

▲

GistNoesis an hour ago | parent | next [-]

The main difference is that this junior employee can't be held responsible if anything goes wrong. And the company which rented you this employee absolves itself from all responsibility too.

Here is a fresh example from today of what junior employee do when given unlimited agentic power : https://www.reddit.com/r/ClaudeAI/comments/1sv7fvc/im_a_nurs...

	▲	tossandthrow an hour ago \| parent [-]
		Your example is not from a Jr developer but from a free agent. I think you will find it very hard to keep a Jr dev in a Corp responsible. I actually think you will find that it is easier to work with agents at a higher quality and lower legal risk than using Jr developers. And this is only going to be amplified when it becomes common knowledge that Ai poses less risk to projects, than Jr staff.

▲

ozgrakkurt an hour ago | parent | prev | next [-]

I understand you mean this as it is close to that in terms of getting the final work.

But in my opinion, it is not even remotely close to the reliability of an educated human, communication wise.

If you gave a research task to a less experienced person, you wouldn’t expect them to convincingly lie about details.

It is useful as a review tool or boilerplate generator but it is not the same aspect you would use a human from.

▲

ipython 2 hours ago | parent | prev | next [-]

Who do you trust with the keys? In any well run organization you have multiple layers of controls. The same concept applies here and I think the gp commenter captured it very well.

	▲	givemeethekeys an hour ago \| parent \| next [-]
		I think you'd trust someone with the keys when they've consistently shown that they can be trusted with less critical work. If you're having to constantly monitor someone's output, then promoting them is a liability. The same applies to an AI model. And, since the same model would be deployed by many teams, unexpected behavior from that model even for a small subset of those teams means that it can't be promoted.
	▲	an hour ago \| parent \| prev [-]
		[deleted]

▲

pbronez 2 hours ago | parent | prev [-]

Yes. I think you can get agents to “Conscious competence” with a lot of well-designed oversight, direction and control. It works, but it’s fragile - nothing like the judgement needed to handle novel situations well.

https://en.wikipedia.org/wiki/Four_stages_of_competence