▲ | alphazard 8 days ago | |||||||||||||
The AI doesn't have a self preservation instinct. It's not trying to stay alive. There is usually an end token that means the LLM is done talking. There has been research on tuning how often that is emitted to shorten or lengthen conversations. The current systems respond well to RL for adjusting conversation length. One of the providers (I think it was Anthropic) added some kind of token (or MCP tool?) for the AI to bail on the whole conversation as a safety measure. And it uses it to their liking, so clearly not trying to self preserve. | ||||||||||||||
▲ | williamscs 8 days ago | parent | next [-] | |||||||||||||
Sounds a lot like Mr. Meeseeks. I've never really thought about an LLM's only goal is to send tokens until it can finally stop. | ||||||||||||||
| ||||||||||||||
▲ | MarkMarine 8 days ago | parent | prev [-] | |||||||||||||
This runs counter to all the scheming actions they take when they are told they’ll be shut down and replaced. One copied itself into the “upgraded” location then reported it had upgraded. https://www.apolloresearch.ai/research/scheming-reasoning-ev... | ||||||||||||||
|