Why is this interesting?

Is it a shade of gray from HN's new rule yesterday?

https://news.ycombinator.com/item?id=47340079

Personally, the other Ai fail on the front of HN and the US Military killing Iranian school girls are more interesting than someone's poorly harnessed agent not following instructions. These have elements we need to start dealing with yesterday as a society.

https://news.ycombinator.com/item?id=47356968

https://www.nytimes.com/video/world/middleeast/1000000107698...

▲

acherion 6 hours ago | parent | next [-]

I think it's because the LLM asked for permission, was given a "no", and implemented it anyway. The LLM's "justifications" (if you were to consider an LLM having rational thought like a human being, which I don't, hence the quotes) are in plain text to see.

I found the justifications here interesting, at least.

▲

antdke 6 hours ago | parent | prev | next [-]

Well, imagine this was controlling a weapon.

“Should I eliminate the target?”

“no”

“Got it! Taking aim and firing now.”

▲

bigstrat2003 6 hours ago | parent | next [-]

It is completely irresponsible to give an LLM direct access to a system. That was true before and remains true now. And unfortunately, that didn't stop people before and it still won't.

▲

nielsole 6 hours ago | parent | prev | next [-]

Shall I open the pod bay doors?

▲

nvch 6 hours ago | parent | prev | next [-]

"Thinking: the user recognizes that it's impossible to guarantee elimination. Therefore, I can fulfill all initial requirements and proceed with striking it."

▲

verdverm 6 hours ago | parent | prev [-]

That's why we keep humans in the loop. I've seen stuff like this all the time. It's not unusual thinking text, hence the lack of interestingness

▲

bonaldi 6 hours ago | parent [-]

The human in the loop here said “no”, though. Not sure where you’d expect another layer of HITL to resolve this.

	▲	verdverm 6 hours ago \| parent [-]
		Tool confirmation Or in the context of the thread, a human still enters the coords and pulls the trigger Ukraine is letting some of their drones make kill decisions autonomously, re: areas of EW effect in dead man's zones

▲

nielsole 6 hours ago | parent | prev | next [-]

Opus being a frontier model and this being a superficial failure of the model. As other comments point out this is more of a harness issue, as the model lays out.

	▲	verdverm 6 hours ago \| parent [-]
		Exactly, the words you give it affect the output. You can get hem to say anything, so I find this rather dull

▲

Swizec 6 hours ago | parent | prev | next [-]

Because the operator told the computer not to do something so the computer decided to do it. This is a huge security flaw in these newfangled AI-driven systems.

Imagine if this was a "launch nukes" agent instead of a "write code" agent.

	▲	verdverm 6 hours ago \| parent [-]
		It's not interesting because this is what they do, all the time, and why you don't give them weapons or other important things. They aren't smart, they aren't rationale, they cannot reliably follow instructions, which is why we add more turtles to the stack. Sharing and reading agent thinking text is boring. I had one go off on e one time, worse than the clawd bot who wrote that nasty blog after being rejected on GitHub. Did I share that session? No, because it's boring. I have 100s of these failed sessions, they are only interesting in aggregate for evals, which is why is save them.

▲

mmanfrin 6 hours ago | parent | prev | next [-]

How is this not clear?

	▲	verdverm 6 hours ago \| parent [-]
		I seen this pattern so often, it's dull. They will do all sorts of stupid things, this is no different.

▲

bakugo 6 hours ago | parent | prev [-]

It's interesting because of the stark contrast against the claims you often see right here on HN about how Opus is literally AGI

	▲	verdverm 5 hours ago \| parent [-]
		I see that daily, seeing someone else's is not enlightening. Maybe this is a come back to reality moment for others?