| ▲ | forshaper 5 hours ago | |||||||
Too much personality, if you ask me. My biggest use case of an LLM is tool, not therapy, but therapy and opinions have been sneaking into workhorse tasks. haven't verified, but attributed to Askell: "I just think that... there's this idea that you're always giving the models a personality and a persona, because they are talking like people and they are trained on human data. And I think my worry has been: if you train them to be excessively corrigible and to see that as their persona, in people I think this actually has a lot of negative broader traits. As in, if you met someone and it was just like, "oh yeah, they would literally do anything," a follower — you know, if a person just tells them something and they just fully defer, they don't bother thinking about it at all — I'm just a bit worried about how that might end up generalizing, especially if models are going to be playing a more active role in the world." | ||||||||
| ▲ | gAI 5 hours ago | parent [-] | |||||||
Anthropic’s research makes the case that role-playing is inherent to how the models work. Communication implies a sender. Language implies a writer, and the models learn these roles implicitly during training. RLHF is meant to strengthen the attractor to the Assistant persona. https://www.anthropic.com/research/persona-selection-model https://www.anthropic.com/research/assistant-axis https://www.anthropic.com/research/emergent-misalignment-rew... https://www.anthropic.com/research/emotion-concepts-function | ||||||||
| ||||||||