▲ | hollerith 10 hours ago | ||||||||||||||||||||||
But Asimov never called it alignment: he never used that word or the phrase "aligned with human values". The first people to use that word and that phrase in the context of AI (about 10 to 13 years ago) where concerned mainly with preventing human extinction or something similarly terrible happening after the AI's capability has exceeded human capabilities across all relevant cognitive skills. BTW, it seems futile to me to try to prevent people from using "AI alignment" in ways not intended by the first people to use it (10 to 13 years ago). A few years ago, writers working for OpenAI started referring to the original concept as "AI superalignment" to distinguish it from newer senses of the phrase, and I will follow that convention here. >the alignment problem has always been about the gap between intended outcomes and actual outcomes. Goodhart's Law. Some believe Goodhart captures the essence of the danger; Gordon Seidoh Worley is one such. (I can probably find the URL of a post he wrote a few years ago if you like.) But many of us feel that Eliezer's "coherent extrapolated volition" (CEV) plan published in 2004 would have prevented Goodhart's Law from causing a catastrophe if the CEV plan could have been implemented in time (i.e., before the more reckless AI labs get everyone killed), which looks unlike to many of us now (because there has been so little progress on implementation of the CEV plan in the 21 years since 2004). The argument that persuaded many of us is that people have a lot of desires, i.e., the algorithmic complexity of human desires is at least dozens or hundreds of bits of information and it is unlikely for that many bits of information to end up in the right place inside the AI by accident or by any process except by human efforts that show much much more mastery of the craft of artificial-mind building than shown by any of the superalignment plans published up to now. One reply made by many is that we can hope that AI (i.e., AIs too weak to be very dangerous) can help human researchers achieve the necessary mastery, but the problem with that is that the reckless AI researchers have AIs helping them, too, so the fact that AIs can help people design AIs does not ameliorate the main problem: namely, we expect it to prove significantly easier to create a dangerously capable AI than it is to keep a dangerously capable AI aligned with human values, and our main reason for believing that is the rapid progress made on the former concern (especially since the start of the deep-learning revolution in 2006) compared to the painfully slow and very tentative-speculative nature of the progress made on the latter concern since public discussion on the latter concern began in 2002 or so. | |||||||||||||||||||||||
▲ | hamburga 8 hours ago | parent | next [-] | ||||||||||||||||||||||
> The argument that persuaded many of us is that people have a lot of desires, i.e., the algorithmic complexity of human desires is at least dozens or hundreds of bits of information I would really try to disentangle this. 1. I don't know what my desires are. 2. "Desire" itself is a vague word that can't be measured or quantified; where does my desire for "feeling at peace" get encoded in any hypothetical artificial mind? 3. People have different and opposing desires. Therefore, Coherent Extrapolated Volition is not coherent to me. This is kind of where I go when I say that any centralized, top-down "grand plan" for AI safety is a folly. On the other hand, we all contribute to Selection. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | godelski 8 hours ago | parent | prev [-] | ||||||||||||||||||||||
He also never said "super intelligence", "general intelligence", or a ton of other things. Why would he? Jargon changed. Doesn't mean what he discussed changed.So it doesn't matter. The fact that someone coined a better term for the concept doesn't mean it isn't the same thing. So of course it gets talked about in the way you see because it has been the same concept the whole time. If we're really going to nitpick then the coined phrase usage was not about killing everyone, aligning with human values. Much more broad and the connection is clearer. It implies killing, but it's still the same problem. (Come on, Asimov's stuff was explicit "aligning with human values" it would be silly to say it isn't) So by your logic we would similarly have to conclude that Asimov never talked about artificial super intelligence despite multivac's various upgrades, up to making a whole universe. Never was saying ASI in "The Last Question", but clearly that's what was discussed. Similarly you'd argue that Asimov only discussed artificial intelligence but never artificial general intelligence. Are none of those robots general? Is Andrew, from Positronic Man, not... "General"? Not sentient? Not conscious? The robot literally transforms into a living breathing human! So I hope you agree that it'd be ridiculous to make such conclusions in these cases. The concepts were identical, we just use slightly different words to describe them now and that isn't a problem. It's only natural that we say "alignment" instead of "steering", "reward hacking", or the god awful "parasitic mutated heuristics". It's all the same thing and the verbiage is much better. |