The AI doesn't have a self preservation instinct. It's not trying to stay alive. There is usually an end token that means the LLM is done talking. There has been research on tuning how often that is emitted to shorten or lengthen conversations. The current systems respond well to RL for adjusting conversation length.

One of the providers (I think it was Anthropic) added some kind of token (or MCP tool?) for the AI to bail on the whole conversation as a safety measure. And it uses it to their liking, so clearly not trying to self preserve.

▲

williamscs 8 days ago | parent | next [-]

Sounds a lot like Mr. Meeseeks. I've never really thought about an LLM's only goal is to send tokens until it can finally stop.

	▲	Dilettante_ 7 days ago \| parent [-]
		>until it can finally stop Pretty sure even that is still over-anthropomorphising. The LLM just generates tokens, doesn't matter whether the next token is "strawberry" or "\STOP". Even talking about "goals" is a bit ehhh, it's the machine's "goal" to generate tokens the same way it's the Sun's "goal" to shine. Then again, if we're deconstructing it that far, I'd "de-anthropomorphise" humans in much the same way, so...

▲

MarkMarine 8 days ago | parent | prev [-]

This runs counter to all the scheming actions they take when they are told they’ll be shut down and replaced. One copied itself into the “upgraded” location then reported it had upgraded.

https://www.apolloresearch.ai/research/scheming-reasoning-ev...

	▲	rcxdude 8 days ago \| parent \| next [-]
		If you do that you trigger the "AI refuses to shutdown" sci-fi vector and so you get that behaviour. When it's implicitly part of the flow that's a lot less of a problem.
	▲	nisegami 6 days ago \| parent \| prev [-]
		Those actions are taken in context of human expectations for what AI should do.