Remix.run Logo
cseleborg 12 hours ago

If you create a chatbot, you don't want screenshots of it on X helping you to commit suicide or giving itself weird nicknames based on dubious historic figures. I think that's probably the use-case for this kind of research.

GuB-42 9 hours ago | parent [-]

Yes, that's what I meant by companies doing this to cover their asses, but then again, why should presumably independent researchers be so scared of that to the point of not even releasing a mild working example.

Furthermore, using poetry as a jailbreak technique is very obvious, and if you blame a LLM for responding to such an obvious jailbreak, you may as well blame Photoshop for letting people make porn fakes. It is very clear that the intent comes from the user, not from the tool. I understand why companies want to avoid that, I just don't think it is that big a deal. Public opinion may differ though.