| ▲ | pixl97 19 hours ago | ||||||||||||||||
>Alignment is a marketing concept put there to appease stakeholders This is a pretty odd statement. Lets take LLMs alone out of this statement and go with a GenAI style guided humanoid robot. It has language models to interpret your instructions, vision models to interpret the world. Mechanical models to guide its movement. If you tell this robot to take a knife and cut onions, alignment means it isn't going to take the knife and chop of your wife. If you're a business, you want a model aligned not to give company secrets. If it's a health model, you want it to not give dangerous information, like conflicting drugs that could kill a person. Our LLMs interact with society and their behaviors will fall under the social conventions of those societies. Much like humans LLMs will still have the bad information, but we can greatly reduce the probabilities they will show it. | |||||||||||||||||
| ▲ | TuringTest 19 hours ago | parent [-] | ||||||||||||||||
> If you tell this robot to take a knife and cut onions, alignment means it isn't going to take the knife and chop of your wife Yeah, I agree that alignment is a desirable property. The problem is that it can't really be achieved by changing the trained weights; alleviated yes, eliminated no. > we can greatly reduce the probabilities they will show it You can change the a priori probabilities, which means that the undesired problem will not be commonly found. The thing is, then the concept provides a false sense of security. Even if the immoral behaviours are not common, they will eventually appear if you run chains of though long enough, or if many people use the model approaching it from different angles or situations. It's the same as with hallucinations. The problem is not that they are more or less frequent; the most severe problem is that their appearance is unpredictable, so the model needs to be supervised constantly; you have to vet every single one of its content generations, as none of them can be trusted by default. Under these conditions, the concept of alignment is severely less helpful than expected. | |||||||||||||||||
| |||||||||||||||||