▲ | AnIrishDuck 6 days ago | |||||||
> ChatGPT is a program. The kid basically instructed it to behave like that. I don't think that's the right paradigm here. These models are hyper agreeable. They are intentionally designed to mimic human thought and social connection. With that kind of machine, "Suicidal person deliberately bypassed safeguards to indulge more deeply in their ideation" still seems like a pretty bad failure mode to me. > Vanilla OpenAI models are known for having too many guardrails, not too few. Sure. But this feels like a sign we probably don't have the right guardrails. Quantity and quality are different things. | ||||||||
▲ | bastawhiz 6 days ago | parent | next [-] | |||||||
> These models are hyper agreeable. They are intentionally designed to mimic human thought and social connection. Python is hyper agreeable. If I comment out some safeguards, it'll happily bypass whatever protections are in place. Lots of people on here argue vehemently against anthropomorphizing LLMs. It's either a computer program crunching numbers, or it's a nebulous form of pseudo-consciousness, but you can't have it both ways. It's either a tool that has no mind of its own that follows instructions, or it thinks for itself. I'm not arguing that the model behaved in a way that's ideal, but at what point do you make the guardrails impassable for 100% of users? How much user intent do you reject in the interest of the personal welfare of someone intent on harming themselves? | ||||||||
| ||||||||
▲ | dragonwriter 6 days ago | parent | prev [-] | |||||||
> They deliberately are designed to mimic human thought and social connection. No, they are deliberately designed to mimic human communication via language, not human thought. (And one of the big sources of data for that was mass scraping social media.) > But this, to me, feels like a sign we probably don't have the right guardrails. Quantity and quality are different things. Right. Focus on quantity implies that the details of "guardrails" don't matter, and that any guardrail is functionally interchangeable with any other guardrail, so as long as you have the right number of them, you have the desired function. In fact, correct function is having the exactly the right combination of guardrails. Swapping a guardrail which would be correct with a different one isn't "having the right number of guardrails", or even merely closer to correct than either missing the correct one or having the different one, but in fact, farther from ideal state than either error alone. | ||||||||
|