Remix.run Logo
hamburga 13 hours ago

> The argument that persuaded many of us is that people have a lot of desires, i.e., the algorithmic complexity of human desires is at least dozens or hundreds of bits of information

I would really try to disentangle this.

1. I don't know what my desires are. 2. "Desire" itself is a vague word that can't be measured or quantified; where does my desire for "feeling at peace" get encoded in any hypothetical artificial mind? 3. People have different and opposing desires.

Therefore, Coherent Extrapolated Volition is not coherent to me.

This is kind of where I go when I say that any centralized, top-down "grand plan" for AI safety is a folly. On the other hand, we all contribute to Selection.

hollerith 13 hours ago | parent | next [-]

>I don't know what my desires are.

No need: it would be the AI's job to find out (after it has become very very capable), not your job.

>"Desire" itself is a vague word that can't be measured or quantified

There are certain ways the future might unfold that would revolt you or make you very sad and others that don't have that problem. There is nothing vague or debatable about that fact even if we use vague words to discuss it.

Again, even the author of the CEV plan no longer put any hope in it. My only reason for bringing it up is to flesh out my assertion that there are superalignment plans not vulnerable to Goodhart's Law/Curse, so Goodhart's Law cannot be the core problem with AI: at the very least, the core problem would need to be a combination of Goodhart with some other consideration, and I have been unable to imagine what that other consideration might be unless perhaps it is the fact that all alignment plans I know about not vulnerable to Goodhart would be too hard to implement in the time humanity has left before unaligned AI kills us or at least permanently disempowers us. But even then it strikes me as misleading or outright wrong to describe Goodhart as the core problem just because there probably won't be enough time to implement a plan not vulnerable to Goodhart. It seem much better to describe the core problem as the ease with which an non-superaligned AI can be created relative to how difficult it will be to create a superaligned AI.

Again "superaligned" means the AI stays aligned even if its capabilities grow much greater than human capabilities.

godelski 11 hours ago | parent [-]

  > not vulnerable to Goodhart's Law/Curse
I'm going to need some good citations on that one.

CEV does not resolve Goodhart's Law. I'm really not sure you even can!

Let me give a really basic example to show you how something you might assume is perfectly aligned actually isn't.

Suppose you want to determine how long your pen is. You grab out your ruler and measure it, right? It's 150 mm, right? Well... no... That's at least +/- 1mm. But that's according to the ruler. How good is your ruler? What's the +/- value from an actual meter? Is it consistently spaced along the graduations? Wait, did you mean clicker open or closed? That's at least a few mm difference.

If you doubt me, go grab as many rulers and measuring devices as you can find. I'm sure you'll find differences. I know in my house I have 4 rulers and none of them are identical to 250um. It's easy to even see the differences between them, though they are pretty close and good enough for any task I'm actually using them for. But if you wanted me to maximize the pen's size, you can bet I'm not going to randomly pick a rule... I'm going to pick a very specific one... Because what are my other options? I can't make the pen any bigger without making an entirely new one or without controlling spacetime.

The point is that this is a trivial measurement where we take everything for granted, yet the measurement isn't perfectly aligned with the intent of the measurement. We can't even do this fundamentally with something as well defined as a meter! The physics will get in the way and we'd have to spend exorbitant amounts of money to get down to the nm scale. These are small amounts of misalignment and frankly, they don't matter for most purposes. But they do matter based on the context. It is why when engineers design parts it is critical to include tolerances. Without them, you haven't actually defined a measurement!

So start extrapolating this. How do you measure to determine "what is a cat"? How do you measure happiness? How do you measure any of that stuff? Even the warped wooden meter stick you see in every Elementary School classroom provides a more well defined measurement than any tool we have for these things!

We're not even capable of determining how misaligned we are!

And that was the point of my earlier post. These are the same thing! What do you think the engineering challenges are?! You're talking about a big problem and complaining that we are breaking it down into smaller workable components. How else do you expect us to fix the big problem? It isn't going to happen through magic. It happens by factorizing it into key components, that can be more easily understood by themselves where then we can work back up by adding complexity. We're sure not going to solve the massively complicated problem if we aren't allowed to try to solve the overly simple naive versions first.

13 hours ago | parent | prev [-]
[deleted]