▲ | godelski 18 hours ago | |||||||||||||||||||||||||||||||||||||
This was never the core problem as originally envisioned. This may be the primary problem that the public was first introduced to, but the alignment problem has always been about the gap between intended outcomes and actual outcomes. Goodhart's Law[0].Super-intelligent AI killing everyone, or even super-dumb AI killing everyone, is a result of the alignment problem when given enough scale. You don't jump to the conclusion of AI killing everyone and post hoc explain through reward hacking, you recognize reward hacking and extrapolate. This is also the reason why it is so important to look at it from engineering problems and from things happening on the smaller scales, *because ignoring all those problems is exactly how you create the scenario of AI killing everyone...* [0] https://en.wikipedia.org/wiki/Goodhart%27s_law [Side note] Even look at Asimov and his robot stories. The majority of them are about alignment. His 3 laws were written as things that sound good and have intent that would be clear to any reader, and then he pulls the rug out on you showing how they're naively defined and it isn't so obvious. Kinda like a programmer teaching their kids to make and PB&J Sandwich... https://www.youtube.com/watch?v=FN2RM-CHkuI | ||||||||||||||||||||||||||||||||||||||
▲ | hollerith 8 hours ago | parent [-] | |||||||||||||||||||||||||||||||||||||
But Asimov never called it alignment: he never used that word or the phrase "aligned with human values". The first people to use that word and that phrase in the context of AI (about 10 to 13 years ago) where concerned mainly with preventing human extinction or something similarly terrible happening after the AI's capability has exceeded human capabilities across all relevant cognitive skills. BTW, it seems futile to me to try to prevent people from using "AI alignment" in ways not intended by the first people to use it (10 to 13 years ago). A few years ago, writers working for OpenAI started referring to the original concept as "AI superalignment" to distinguish it from newer senses of the phrase, and I will follow that convention here. >the alignment problem has always been about the gap between intended outcomes and actual outcomes. Goodhart's Law. Some believe Goodhart captures the essence of the danger; Gordon Seidoh Worley is one such. (I can probably find the URL of a post he wrote a few years ago if you like.) But many of us feel that Eliezer's "coherent extrapolated volition" (CEV) plan published in 2004 would have prevented Goodhart's Law from causing a catastrophe if the CEV plan could have been implemented in time (i.e., before the more reckless AI labs get everyone killed), which looks unlike to many of us now (because there has been so little progress on implementation of the CEV plan in the 21 years since 2004). The argument that persuaded many of us is that people have a lot of desires, i.e., the algorithmic complexity of human desires is at least dozens or hundreds of bits of information and it is unlikely for that many bits of information to end up in the right place inside the AI by accident or by any process except by human efforts that show much much more mastery of the craft of artificial-mind building than shown by any of the superalignment plans published up to now. One reply made by many is that we can hope that AI (i.e., AIs too weak to be very dangerous) can help human researchers achieve the necessary mastery, but the problem with that is that the reckless AI researchers have AIs helping them, too, so the fact that AIs can help people design AIs does not ameliorate the main problem: namely, we expect it to prove significantly easier to create a dangerously capable AI than it is to keep a dangerously capable AI aligned with human values, and our main reason for believing that is the rapid progress made on the former concern (especially since the start of the deep-learning revolution in 2006) compared to the painfully slow and very tentative-speculative nature of the progress made on the latter concern since public discussion on the latter concern began in 2002 or so. | ||||||||||||||||||||||||||||||||||||||
|