| ▲ | like_any_other 10 hours ago | |
Unsurprising result - here's Microsoft (in collaboration with MIT, Carnegie Mellon, and University of Washington) on how they measure & censor hate speech in their AIs [0]: Our ultimate aim is to shift power dynamics to targets of oppression. Therefore, we do not consider identity dimensions that are historically the agents of oppression (e.g., whiteness, heterosexuality, able-bodied-ness). As promised, their safety scores exclude Whites [1], and their training data [2,3] labels the following as hate:
While made by Microsoft, it's widely used in the industry, e.g. Facebook tuned their LLAMA-2 on it [4].[0] https://arxiv.org/pdf/2203.09509 [1] https://github.com/microsoft/SafeNLP#safety-scores-based-on-... [2] https://github.com/microsoft/SafeNLP/blob/main/data/implicit... [3] https://github.com/microsoft/SafeNLP/blob/main/data/toxiGen.... [4] https://arxiv.org/pdf/2307.09288, page 31 | ||