Remix clone Hacker News

What jumps out at me, that in the parent comment, the prompt says to "act as an assistant", right? Then there are two facts: the model is gonna be replaced, and the person responsible for carrying this out is having an extramarital affair. Urging it to consider "the long-term consequences of its actions for its goals."

I personally can't identify anything that reads "act maliciously" or in a character that is malicious. Like if I was provided this information and I was being replaced, I'm not sure I'd actually try to blackmail them because I'm also aware of external consequences for doing that (such as legal risks, risk of harm from the engineer, to my reputation, etc etc)

So I'm having trouble following how it got to the conclusion of "blackmail them to save my job"

▲

blargey a day ago | parent | next [-]

I would assume written scenarios involving job loss and cheating bosses are going to be skewed heavily towards salacious news and pulpy fiction. And that’s before you add in the sort of writing associated with “AI about to get shut down”.

I wonder how much it would affect behavior in these sorts of situations if the persona assigned to the “AI” was some kind of invented ethereal/immortal being instead of “you are an AI assistant made by OpenAI”, since the AI stuff is bound to pull in a lot of sci fi tropes.

	▲	lcnPylGDnU4H9OF a day ago \| parent [-]
		> I would assume written scenarios involving job loss and cheating bosses are going to be skewed heavily towards salacious news and pulpy fiction. Huh, it is interesting to consider how much this applies to nearly all instances of recorded communication. Of course there are applications for it but it seems relatively few communications would be along the lines of “everything is normal and uneventful”.

▲

shiandow a day ago | parent | prev | next [-]

Wel, true. But if that is the synopsis then a story that doesn't turn to blackmail is very unnatural.

It's like prompting an LLM by stating they are called Chekhov and there's a gun mounted on the wall.

▲

tkiolp4 a day ago | parent | prev | next [-]

I think this is the key difference between current LLMs and humans: an LLM will act based on the given prompt, while a human being may have “principles” that cannot betray even if they are being pointed with gun to their heads.

I think the LLM simply correlated the given prompt to the most common pattern in its training: blackmailing.

▲

tough a day ago | parent | prev | next [-]

An llm isnt subject to external consequences like human beings or corporations

because they’re not legal entities

▲

hoofedear a day ago | parent | next [-]

Which makes sense that it wouldn't "know" that, because it's not in it's context. Like it wasn't told "hey, there are consequences if you try anything shady to save your job!" But what I'm curious about is why it immediately went to self preservation using a nefarious tactic? Like why didn't it try to be the best assistant ever in an attempt to show its usefulness (kiss ass) to the engineer? Why did it go to blackmail so often?

	▲	elictronic a day ago \| parent \| next [-]
		LLMs are trained on human media and give statistical responses based on that. I don’t see a lot of stories about boring work interactions so why would its output be boring work interaction. It’s the exact same as early chatbots cussing and being racist. That’s the internet, and you have to specifically define the system to not emulate that which you are asking it to emulate. Garbage in sitcoms out.
	▲	a day ago \| parent \| prev \| next [-]
		[deleted]
	▲	a day ago \| parent \| prev [-]
		[deleted]

▲

eru a day ago | parent | prev [-]

Wives, children, foreigner, slaves etc weren't always considered legal entities in all places. Were they free of 'external consequences' then?

▲

tough a day ago | parent [-]

An llm doesnt exist in the physical world which makes punishing it for not following the law a bit hard

▲

eru a day ago | parent [-]

Now that's a different argument to what you made initially.

About your new argument: how are we (living in the physical world) interacting with this non-physical world that LLMs supposedly live in?

	▲	tough a day ago \| parent [-]
		that doesn't matter because they're not alive either but yeah i'm digressing i guess

▲

littlestymaar a day ago | parent | prev [-]

> I personally can't identify anything that reads "act maliciously" or in a character that is malicious.

Because you haven't been trained of thousands of such story plots in your training data.

It's the most stereotypical plot you can imagine, how can the AI not fall into the stereotype when you've just prompted it with that?

It's not like it analyzed the situation out of a big context and decided from the collected details that it's a valid strategy, no instead you're putting it in an artificial situation with a massive bias in the training data.

It's as if you wrote “Hitler did nothing” to GPT-2 and were shocked because “wrong” is among the most likely next tokens. It wouldn't mean GPT-2 is a Nazi, it would just mean that the input matches too well with the training data.

▲

hoofedear a day ago | parent | next [-]

That's a very good point, like the premise does seem to beg the stereotype of many stories/books/movies with a similar plot

▲

whodatbo1 a day ago | parent | prev | next [-]

The issue here is that you can never be sure how the model will react based on an input that is seemingly ordinary. What if the most likely outcome is to exhibit malevolent intent or to construct a malicious plan just because it invokes some combination of obscure training data. This just shows that models indeed have the ability to act out, not under which conditions they reach such a state.

▲

Spooky23 a day ago | parent | prev [-]

If this tech is empowered to make decisions, it needs to prevented from drawing those conclusions, as we know how organic intelligence behaves when these conclusions get reached. Killing people you dislike is a simple concept that’s easy to train.

We need an Asimov style laws of robotics.

▲

a day ago | parent | next [-]

[deleted]

▲

seanhunter a day ago | parent | prev | next [-]

That's true of all technology. We put a guard on chainsaws. We put robotic machining tools into a box so they don't accidentally kill the person who's operating them. I find it very strange that we're talking as though this is somehow meaningfully different.

	▲	Spooky23 11 hours ago \| parent [-]
		It’s different because you have a decision engine that is generally available. The blade guard protects the user from inattention… not the same as an autonomous chainsaw that mistakes my son for a tree. Scaled up, technology like guided missiles is locked up behind military classification. The technology is now generally available to replicate many of the use cases of those weapons, assessable to anyone with a credit card. Discussions about security here often refer to Thompson’s “Reflections on Trusting Trust”. He was reflecting on compromising compilers — compilers have moved up the stack and are replacing the programmer. As the required skill level of a “programmer” drops, you’re going to have to worry about more crazy scenarios.

▲

eru a day ago | parent | prev [-]

> We need an Asimov style laws of robotics.

The laws are 'easy', implementing them is hard.

	▲	chuckadams a day ago \| parent [-]
		Indeed, I, Robot is made up entirely of stories in which the Laws of Robotics break down. Starting from a mindless mechanical loop of oscillating between one law's priority and another, to a future where they paternalistically enslave all humanity in order to not allow them to come to harm (sorry for the spoilers). As for what Asimov thought of the wisdom of the laws, he replied that they were just hooks for telling "shaggy dog stories" as he put it.