Remix.run Logo
EricMausler 5 days ago

> warmth and empathy don't immediately strike me as traits that are counter to correctness

This was my reaction as well. Something I don't see mentioned is I think maybe it has more to do with training data than the goal-function. The vector space of data that aligns with kindness may contain less accuracy than the vector space for neutrality due to people often forgoing accuracy when being kind. I do not think it is a matter of conflicting goals, but rather a priming towards an answer based more heavily on the section of the model trained on less accurate data.

I wonder if the prompt was layered, asking it to coldy/bluntly derive the answer and then translate itself into a kinder tone (maybe with 2 prompts), if the accuracy would still be worse.