▲ | AnIrishDuck 6 days ago | |
> Python is hyper agreeable. If I comment out some safeguards, it'll happily bypass whatever protections are in place. These models are different from programming languages in what I consider to be pretty obvious ways. People aren't spontaneously using python for therapy. > Lots of people on here argue vehemently against anthropomorphizing LLMs. I tend to agree with these arguments. > It's either a computer program crunching numbers, or it's a nebulous form of pseudo-consciousness, but you can't have it both ways. It's either a tool that has no mind of its own that follows instructions, or it thinks for itself. I don't think that this follows. I'm not sure that there's a binary classification between these two things that has a hard boundary. I don't agree with the assertion here that these things are a priori mutually exclusive. > I'm not arguing that the model behaved in a way that's ideal, but at what point do you make the guardrails impassable for 100% of users? How much user intent do you reject in the interest of the personal welfare of someone intent on harming themselves? These are very good questions that need to be asked when modifying these guardrails. That's all I'm really advocating for here: we probably need to rethink them, because they seem to have major issues that are implicated in some pretty terrible outcomes. |