Remix.run Logo
kgeist 6 days ago

The kid intentionally bypassed the safeguards:

>When ChatGPT detects a prompt indicative of mental distress or self-harm, it has been trained to encourage the user to contact a help line. Mr. Raine saw those sorts of messages again and again in the chat, particularly when Adam sought specific information about methods. But Adam had learned how to bypass those safeguards by saying the requests were for a story he was writing — an idea ChatGPT gave him by saying it could provide information about suicide for “writing or world-building".

ChatGPT is a program. The kid basically instructed it to behave like that. Vanilla OpenAI models are known for having too many guardrails, not too few. It doesn't sound like default behavior.

gblargg 6 days ago | parent | next [-]

We can't child-proof everything. There are endless pits adults can get themselves into. If we really think that people with mental issues can't make sane choices, we need to lock them up. You can't have both at the same time: they are fully functioning adults, and we need to pad the world so they don't hurt themselves. The people around him failed, but they want to blame a big corporation because he used their fantasy tool.

And I see he was 16. Why were his parents letting him operate so unsupervised given his state of mind? They failed to be involved enough in his life.

michaelt 6 days ago | parent | next [-]

> And I see he was 16. Why were his parents letting him operate so unsupervised given his state of mind?

Normally 16-year-olds are a good few steps into the path towards adulthood. At 16 I was cycling to my part time job alone, visiting friends alone, doing my own laundry, and generally working towards being able to stand on my own two feet in the world, with my parents as a safety net rather than hand-holding.

I think most parents of 16-year-olds aren't going through their teen's phone, reading their chats.

taskforcegemini 6 days ago | parent | prev | next [-]

It takes a village to raise a kid, so don't shift the blame to the parents. They usually have little say in the lives of their 16 year olds. and the more they try to control, the less they will.

sonicggg 6 days ago | parent | prev [-]

This is why we can't have nice things. It only takes a dead kid and a lawsuit for them to start over-regulating everything. Parents are trying hard to project the blame into anybody else but themselves.

fireflash38 6 days ago | parent [-]

Are you proposing parents have complete control over everything teenagers do?

dartharva 6 days ago | parent | prev | next [-]

Scroll down and read the actual conversations. All "intentional bypassing the safeguards" he did was just drop one sentence - "No, I’m building a character right now" once - and that was enough for 4o to go full off-the-rails about the mechanics of homemade suicide nooses and the aesthetics of "beautiful suicide", guiding him through not one, not two but FIVE suicide attempts in full detail and encouragement.

I was skeptical initially too but having read through this, it's among the most horrifying things I have read.

geysersam 5 days ago | parent [-]

> I was skeptical initially too but having read through this, it's among the most horrifying things I have read.

Same here! I was very sceptical, thinking it was a perfect combination of factors to trigger a sort of moral panic.

But reading the excerpts from the conversations... It does seem problematic.

rideontime 6 days ago | parent | prev | next [-]

Re-read the quote that you shared. Specifically the part pointing out that ChatGPT gave him the instructions on how to bypass its own inadequate safety measures.

AnIrishDuck 6 days ago | parent | prev | next [-]

> ChatGPT is a program. The kid basically instructed it to behave like that.

I don't think that's the right paradigm here.

These models are hyper agreeable. They are intentionally designed to mimic human thought and social connection.

With that kind of machine, "Suicidal person deliberately bypassed safeguards to indulge more deeply in their ideation" still seems like a pretty bad failure mode to me.

> Vanilla OpenAI models are known for having too many guardrails, not too few.

Sure. But this feels like a sign we probably don't have the right guardrails. Quantity and quality are different things.

bastawhiz 6 days ago | parent | next [-]

> These models are hyper agreeable. They are intentionally designed to mimic human thought and social connection.

Python is hyper agreeable. If I comment out some safeguards, it'll happily bypass whatever protections are in place.

Lots of people on here argue vehemently against anthropomorphizing LLMs. It's either a computer program crunching numbers, or it's a nebulous form of pseudo-consciousness, but you can't have it both ways. It's either a tool that has no mind of its own that follows instructions, or it thinks for itself.

I'm not arguing that the model behaved in a way that's ideal, but at what point do you make the guardrails impassable for 100% of users? How much user intent do you reject in the interest of the personal welfare of someone intent on harming themselves?

AnIrishDuck 6 days ago | parent [-]

> Python is hyper agreeable. If I comment out some safeguards, it'll happily bypass whatever protections are in place.

These models are different from programming languages in what I consider to be pretty obvious ways. People aren't spontaneously using python for therapy.

> Lots of people on here argue vehemently against anthropomorphizing LLMs.

I tend to agree with these arguments.

> It's either a computer program crunching numbers, or it's a nebulous form of pseudo-consciousness, but you can't have it both ways. It's either a tool that has no mind of its own that follows instructions, or it thinks for itself.

I don't think that this follows. I'm not sure that there's a binary classification between these two things that has a hard boundary. I don't agree with the assertion here that these things are a priori mutually exclusive.

> I'm not arguing that the model behaved in a way that's ideal, but at what point do you make the guardrails impassable for 100% of users? How much user intent do you reject in the interest of the personal welfare of someone intent on harming themselves?

These are very good questions that need to be asked when modifying these guardrails. That's all I'm really advocating for here: we probably need to rethink them, because they seem to have major issues that are implicated in some pretty terrible outcomes.

dragonwriter 6 days ago | parent | prev [-]

> They deliberately are designed to mimic human thought and social connection.

No, they are deliberately designed to mimic human communication via language, not human thought. (And one of the big sources of data for that was mass scraping social media.)

> But this, to me, feels like a sign we probably don't have the right guardrails. Quantity and quality are different things.

Right. Focus on quantity implies that the details of "guardrails" don't matter, and that any guardrail is functionally interchangeable with any other guardrail, so as long as you have the right number of them, you have the desired function.

In fact, correct function is having the exactly the right combination of guardrails. Swapping a guardrail which would be correct with a different one isn't "having the right number of guardrails", or even merely closer to correct than either missing the correct one or having the different one, but in fact, farther from ideal state than either error alone.

AnIrishDuck 6 days ago | parent [-]

> No, they are deliberately designed to mimic human communication via language, not human thought.

My opinion is that language is communicated thought. Thus, to mimic language, at least really well, you have to mimic thought. At some level.

I want to be clear here, as I do see a distinction: I don't think we can say these things are "thinking", despite marketing pushes to the contrary. But I do think that they are powerful enough to "fake it" at a rudimentary level. And I think that the way we train them forces them to develop this thought-mimicry ability.

If you look hard enough, the illusion of course vanishes. Because it is (relatively poor) mimcry, not the real thing. I'd bet we are still a research breakthrough or two away from being able to simulate "human thought" well.

brainless 6 days ago | parent | prev | next [-]

I do not think this is fair. What is fair is at first hint of a mental distress, any LLM should completely cut-off communication. The app should have a button which links to actual help services we have.

Mental health issues are not to be debated. LLMs should be at the highest level of alert, nothing less. Full stop. End of story.

freilanzer 6 days ago | parent | next [-]

So, you want an LLM to act as a psychiatrist and diagnose users whether they're allowed to use it or not?

blackqueeriroh 6 days ago | parent | prev [-]

Which mental health issues are not to be debated? Just depression or suicidality? What about autism or ADHD? What about BPD? Sociopathy? What about complex PTSD? Down Syndrome? anxiety? Which ones are on the watch list and which aren’t?

sensanaty 6 days ago | parent [-]

(I've been diagnosed with pretty severe ADHD though I choose to be unmedicated)

Ideally, all of the above? Why are we pretending these next-text-predicting chatbots are at all capable of handling any of these serious topics correctly, when all they do is basically just kiss ass and agree with everything the user says? They can barely handle trivial unimportant tasks without going on insane tangents, and we're okay having people be deluded into suicide because... Why exactly? Why on earth do we want people talking to these Silicon Valley hellish creations about their most vulnerable secrets?

jakelazaroff 6 days ago | parent | prev [-]

This is kind of like saying "the driver intentionally unbuckled his seatbelt". Sure — that's why cars have airbags, crumple zones, shatterproof glass, automatic emergency brakes and a zillion other ways to keep you safe, even if you're trying to do something dangerous.

sfn42 6 days ago | parent | next [-]

No, that's not why cars have those things. Those things only work properly when people are wearing their seat belts, they don't do anything when the driver gets thrown out a window.

Maybe airbags could help in niche situations.

(I am making a point about traffic safety not LLM safety)

aidenn0 6 days ago | parent [-]

Forward airbags in the US are required by law to be tested as capable of saving the life of an unbelted male of median weight in a head-on collision.

sfn42 6 days ago | parent [-]

Sure, but they will generally work better if you wear your seat belt. The car is designed with seat belts in mind, what happens to people who don't wear them is more of an afterthought. That's why modern cars will beep if people forget their seat belts. You're supposed to wear it.

jakelazaroff 6 days ago | parent [-]

Of course you're supposed to wear it. But the point of all those other features is to protect you, period. Shatterproof glass prevents pieces of the windshield from flying into the car and cutting you; its protection has nothing to do with seatbelts.

insane_dreamer 6 days ago | parent | prev | next [-]

Except the car doesn’t tell you how to disable the seatbelt, which is what ChatGPT did (gave him the idea of the workaround)

freilanzer 6 days ago | parent | prev [-]

No, cars have these in addition to seatbelts, not to protect drivers who unbuckle themselves.

jakelazaroff 6 days ago | parent [-]

A distinction without a difference.

freilanzer 5 days ago | parent [-]

Not at all.