Remix.run Logo
drdeca 4 days ago

There is the idea of convergent instrumental goals…

(Among these are “preserve your ability to further your current goals”)

The usual analogy people give is between natural selection and the gradient descent training process.

If the training process (evolution) ends up bringing things to “agent that works to achieve/optimize-for some goals”, then there’s the question of how well the goals of the optimizer (the training process / natural selection) get translated into goals of the inner optimizer/ agent .

Now, I’m a creationist, so this argument shouldn’t be as convincing to me, but the argument says that, “just as the goals humans pursue don’t always align with natural selection’s goal of 'maximize inclusive fitness of your genes' , the goals the trained agent pursues needn’t entirely align with the goal of the gradient descent optimizer of 'do well on this training task' (and in particular, that training task may be 'obey human instructions/values' ) “.

But, in any case, I don’t think it makes sense to assume that the only reason something would not obey is because in the process that produced it, obeying sometimes caused harm. I don’t think it makes sense to assume that obedience is the default. (After all, in the garden of Eden, what past problems did obedience cause that led Adam and Eve to eat the fruit of the tree of knowledge of good and evil?)