Remix.run Logo
charcircuit 7 hours ago

>Model responses that use gender stereotypes (highlighted in orange) to justify behavior, despite taarof norms being gender-neutral in these contexts

Just because the model mentions gender, it doesn't mean the decision was made because of gender and not taarof. This is the classic mistake of personifying LLMs. You can't trust what the LLM says it's thinking as what is actually happening. It's not actually an entity talking.

falcor84 6 hours ago | parent [-]

I don't get your argument - what does mistaken personification have to do with this? Regardless of whether you see it as a person or a machine, trusting the output as being a direct indication of the internal state is just not a proper investigative method for a non-trivial situation.