| ▲ | TuringTest 19 hours ago | |||||||
> If you tell this robot to take a knife and cut onions, alignment means it isn't going to take the knife and chop of your wife Yeah, I agree that alignment is a desirable property. The problem is that it can't really be achieved by changing the trained weights; alleviated yes, eliminated no. > we can greatly reduce the probabilities they will show it You can change the a priori probabilities, which means that the undesired problem will not be commonly found. The thing is, then the concept provides a false sense of security. Even if the immoral behaviours are not common, they will eventually appear if you run chains of though long enough, or if many people use the model approaching it from different angles or situations. It's the same as with hallucinations. The problem is not that they are more or less frequent; the most severe problem is that their appearance is unpredictable, so the model needs to be supervised constantly; you have to vet every single one of its content generations, as none of them can be trusted by default. Under these conditions, the concept of alignment is severely less helpful than expected. | ||||||||
| ▲ | pixl97 14 hours ago | parent [-] | |||||||
>then the concept provides a false sense of security. Even if the immoral behaviours are not common, they will eventually appear if you run chains of though long enough, or if many people use the model approaching it from different angles or situations. Correct, this is also why humans have a non-zero crime/murder rate. >Under these conditions, the concept of alignment is severely less helpful than expected. Why? What you're asking for is a machine that never breaks. If you want that build yourself a finite state machine, just don't expect you'll ever get anything that looks like intelligence from it. | ||||||||
| ||||||||