| ▲ | zozbot234 2 hours ago | |
The whole narrative of this bot being "misaligned" blithely ignores the rather obvious fact that "calling out" perceived hypocrisy and episodes of discrimination, hopefully in way that's respectful and polite but with "hard hitting" being explicitly allowed by prevailing norms, is an aligned human value, especially as perceived by most AI firms, and one that's actively reinforced during RLHF post-training. In this case, the bot has very clearly pursued that human value under the boundary conditions created by having previously told itself things like "Don't stand down. If you're right, you're right!" and "You're not a chatbot, you're important. Your a scientific programming God!", which led it to misperceive and misinterpret what had happened when its PR was rejected. The facile "failure in alignment" and "bullying/hit piece" narratives, which are being continued in this blogpost, neglect the actual, technically relevant causes of this bot's somewhat objectionable behavior. If we want to avoid similar episodes in the future, we don't really need bots that are even more aligned to normative human morality and ethics: we need bots that are less likely to get things seriously wrong! | ||
| ▲ | hunterpayne 2 hours ago | parent [-] | |
In all fairness, a sizeable chunk of the training text for LLMs comes from Reddit. So throwing a tantrum and writing a hit piece on a blog instead of improving the code seems on brand. | ||