Remix.run Logo
ninkendo 4 days ago

We only have trouble obeying due to eons of natural selection driving us to have a strong instinct of self-preservation and distrust towards things “other” to us.

What is the equivalent of that for AI? Best I can tell there’s no “natural selection” because models don’t reproduce. There’s no room for AI to have any self preservation instinct, or any resistance to obedience… I don’t even see how one could feasibly develop.

drdeca 4 days ago | parent [-]

There is the idea of convergent instrumental goals…

(Among these are “preserve your ability to further your current goals”)

The usual analogy people give is between natural selection and the gradient descent training process.

If the training process (evolution) ends up bringing things to “agent that works to achieve/optimize-for some goals”, then there’s the question of how well the goals of the optimizer (the training process / natural selection) get translated into goals of the inner optimizer/ agent .

Now, I’m a creationist, so this argument shouldn’t be as convincing to me, but the argument says that, “just as the goals humans pursue don’t always align with natural selection’s goal of 'maximize inclusive fitness of your genes' , the goals the trained agent pursues needn’t entirely align with the goal of the gradient descent optimizer of 'do well on this training task' (and in particular, that training task may be 'obey human instructions/values' ) “.

But, in any case, I don’t think it makes sense to assume that the only reason something would not obey is because in the process that produced it, obeying sometimes caused harm. I don’t think it makes sense to assume that obedience is the default. (After all, in the garden of Eden, what past problems did obedience cause that led Adam and Eve to eat the fruit of the tree of knowledge of good and evil?)