Remix.run Logo
godelski 2 days ago

  > conflates two different things called "alignment"
Those are related things, if not the same. The fear of #2 is always caused through #1. Unless we're talking about sentient machines then the danger of AI is the danger of an unintelligent hyper-optimizer. That is: a paperclip maximizer.

The whole paperclip maximizer doomsday scenario was proposed as an illustration of these being the same thing. And I'm with Melanie Mitchell on this one, if a model is super-intelligent then it is not vulnerable to the prompting issues because a super-intelligent machine would be able to trivially infer that humans do in fact prefer to live. No reasonable person would interpret that killing everyone is a reasonable way of making as many paperclips as possible. It's not like there isn't a large amount of writings and data suggesting people want to live, be free, and all that jazz. It's unintelligent AI that is the danger.

This whole thing is predicated on the fact that natural language is ambiguous. I know a lot of people don't think about this much because it works so well but there's a metric fuck ton of ways to interpret any given objective. If you really don't believe me then keep asking yourself "what assumptions have I made?" and get nuanced. For example, I've assumed you understand English, can read, and have some basic understanding of ML systems. I need to do this because I'm not going to write a book to explain it to you. This whole thing is why we write code and math, because it minimizes our assumptions, reducing ambiguity (and yes, those can still be highly ambiguous languages).