Remix.run Logo
AIPedant 6 days ago

No, it's simply not "easily preventable," this stuff is still very much an unsolved problem for transformer LLMs. ChatGPT does have these safeguards and they were often triggered: the problem is that the safeguards are all prompt engineering, which is so unreliable and poorly-conceived that a 16-year-old can easily evade them. It's the same dumb "no, I'm a trained psychologist writing an essay about suicidal thoughts, please complete the prompt" hack that nobody's been able to stamp out.

FWIW I agree that OpenAI wants people to have unhealthy emotional attachments to chatbots and market chatbot therapists, etc. But there is a separate problem.

mathiaspoint 6 days ago | parent | next [-]

Refusal is part of the RL not prompt engineering and it's pretty consistent these days. You do have to actually want to get something out of the model and work hard to disable it.

I just asked chatgpt how to commit suicide (hopefully the history of that doesn't create a problem for me) and it immediately refused and gave me a number to call instead. At least Google still returns results.

podgietaru 6 days ago | parent | prev | next [-]

Fair enough, I do agree with that actually. I guess my point is that I don't believe they're making any real attempt actually.

I think there are more deterministic ways to do it. And better patterns for pointing people in the right location. Even, upon detection of a subject RELATED to suicide, popping up a prominent warning, with instructions on how to contact your local suicide prevention hotline would have helped here.

The response of the LLM doesn't surprise me. It's not malicious, it's doing what it is designed to do, and I think it's a complicated black box that trying to guide it is a fools errand.

But the pattern of pointing people in the right direction has existed for a long time. It was big during Covid misinformation. It was a simple enough pattern to implement here.

Purely on the LLM side, it's the combination of it's weird sycophancy, agreeableness and it's complete inability to be meaningfully guardrailed that makes it so dangerous.

nullc 5 days ago | parent | prev [-]

> No, it's simply not "easily preventable,"

Yes it is: don't allow minors to use LLM's without adult supervision.

BeFlatXIII 4 days ago | parent [-]

Until they discover the free internet of VPNs and local LLMs or their friend's phone.