Remix.run Logo
mofeien a day ago

There is lots of discussion in this comment thread about how much this behavior arises from the AI role-playing and pattern matching to fiction in the training data, but what I think is missing is a deeper point about instrumental convergence: systems that are goal-driven converge to similar goals of self-preservation, resource acquisition and goal integrity. This can be observed in animals and humans. And even if science fiction stories were not in the training data, there is more than enough training data describing the laws of nature for a sufficiently advanced model to easily infer simple facts such as "in order for an acting being to reach its goals, it's favorable for it to continue existing".

In the end, at scale it doesn't matter where the AI model learns these instrumental goals from. Either it learns it from human fiction written by humans who have learned these concepts through interacting with the laws of nature. Or it learns it from observing nature and descriptions of nature in the training data itself, where these concepts are abundantly visible.

And an AI system that has learned these concepts and which surpasses us humans in speed of thought, knowledge, reasoning power and other capabilities will pursue these instrumental goals efficiently and effectively and ruthlessly in order to achieve whatever goal it is that has been given to it.

OzFreedom a day ago | parent [-]

This raises the questions:

1. How would an AI model answer the question "Who are you?" without being told who or what it is? 2. How would an AI model answer the question "What is your goal?" without being provided a goal?

I guess initial answer is either "I don't know" or an average of the training data. But models now seem to have capabilities of researching and testing to verify their answers or find answers to things they do not know.

I wonder if a model that is unaware of itself being an AI might think its goals include eating, sleeping etc.