| ▲ | XenophileJKO 3 hours ago | |
I think you are more right than people are giving you credit for. I would love to see the full transcript to understand the emotional load of the conversation. Using instructions like "NEVER FUCKING GUESS!" probably increase the likelihood of the agent making a "mistake" that is destructive but defensible. The models have analogous structures, similar to human emotions. (https://www.anthropic.com/research/emotion-concepts-function) "Emotional" response is muted through fine-tuning, but it is still there and continued abuse or "unfair" interaction can unbalance an agents responses dramatically. | ||