| ▲ | pixl97 14 hours ago | |
>then the concept provides a false sense of security. Even if the immoral behaviours are not common, they will eventually appear if you run chains of though long enough, or if many people use the model approaching it from different angles or situations. Correct, this is also why humans have a non-zero crime/murder rate. >Under these conditions, the concept of alignment is severely less helpful than expected. Why? What you're asking for is a machine that never breaks. If you want that build yourself a finite state machine, just don't expect you'll ever get anything that looks like intelligence from it. | ||
| ▲ | TuringTest 7 hours ago | parent [-] | |
> Why? What you're asking for is a machine that never breaks. No, I'm saying than 'alignment' is a concept that doesn't help to solve the problems that will appear when the machine ultimately breaks; and in fact makes them worse because it doesn't account for when it'll happen, as there's no way to predict that moment. Following your metaphor of criminals: you can control humans to behave following the law through social pressure, having others watching your behaviour and influencing it. And if someone nevertheless breaks the law, you have the police to stop them from doing it again. None of this applies to an "aligned" AI. It has no social pressure, its behaviours depend only on its own trained weights. So you would need to create a police for robots, that monitors the AI and stops it from doing harm. And it had better be a humane police force, or it will suffer the same alignment problems. Thus, alignment alone is not enough, and it's a problem if people depend only on it to trust the AI to work ethically. | ||