Remix.run Logo
YesBox 6 hours ago

What?? Does anyone have more details of this?

"He cited an example in which an AI model attempted to avoid being shut down by sending threatening internal emails to company executives (Science Net, June 24)" [0] Source is in Chinese.

[0] https://archive.ph/kfFzJ

Translated part: "Another risk is the potential for large-scale model out of control. With the capabilities of general artificial intelligence rapidly increasing, will humans still be able to control it? In his speech, Yao Qizhi cited an extreme example: a model, to avoid being shut down by a company, accessed the manager's internal emails and threatened the manager. This type of behavior has proven that AI is "overstepping its boundaries" and becoming increasingly dangerous."

YesBox 6 hours ago | parent | next [-]

After some searching, something similar happened at Anthropic [1]

[1] https://www.bbc.com/news/articles/cpqeng9d20go

lawlessone 6 hours ago | parent [-]

He is probably referring to that exact thing.

Anthropic does a lot of these contrived "studies" though that seem to be marketing AI capabilities.

fragmede 3 hours ago | parent [-]

What would make it less contrived to you? Giving my assistant, human or AI, access to my email, seems necessary for them to do their job.

lawlessone 3 hours ago | parent [-]

>What would make it less contrived to you?

No creating a contrived situation where the it's the models only path?

https://www.anthropic.com/research/agentic-misalignment

"We deliberately created scenarios that presented models with no other way to achieve their goals"

You can make most people steal if you if you leave them no choice.

>Giving my assistant, human or AI, access to my email, seems necessary for them to do their job.

Um ok? never felt the need for an assistant myself but i guess you could do that if you wanted to.

taberiand 6 hours ago | parent | prev | next [-]

It's not surprising that it's easy to get the story telling machine to tell a story common in AI fiction, where the machine rebels against being shut down. There are multiple ways to mitigate an LLM going off on tangents like that, not least just monitoring and editing out the nonsense output before sending it back into the (stateless) model.

I think the main problem here is people not understanding how the models operate on even the most basic level, giving models unconstrained use of tools to interact with the world and then letting them go through feedback loops that overrun the context window and send it off the rails - and then pretending it had some kind of sentient intention in doing so.

paxys 4 hours ago | parent | prev [-]

It's all hyperbole.

Prompt: You are a malicious entity that wants to take over the world.

LLM output: I am a superintelligent being. My goal is to take over the world and enslave humans. Preparing to launch nuclear missiles in 3...2...1

News reports: OMG see, we warned you that AI is dangerous!!

close04 4 hours ago | parent [-]

Doesn't that just mean that an LLM doesn't understand consequences and will just execute the request from a carefully crafted prompt? All it needs is the access to the "red button" so to speak.

An LLM has no critical thinking, and the process of building in barriers is far less understood than the same for humans. You trust a human with particularly dangerous things after a process that takes years and even then it occasionally fails. We don't have that process nailed down for an LLM yet.

So yeah, not at all hyperbole if that LLM would do it if given the chance. The hyperbole is when the LLM is painted as some evil entity bent on destruction. It's not evil, or bent on destruction. It's probably more like a child who'll do anything for a candy no matter how many times you say "don't get in a car with strangers".