| ▲ | pbhjpbhj 12 hours ago | |||||||
Where is ChatGPT picking up the supportive pre-suicide comments from. It feels like that genre of comment has to be copied from somewhere. They're long and almost eloquent. They can't be emergent generation, surely? Is there a place on the web where these sorts of 'supportive' comments are given to people who have chosen suicide? | ||||||||
| ▲ | krackers 3 hours ago | parent | next [-] | |||||||
>They can't be emergent generation, surely It is. It's what you get when you RLHF for catchy, agreeable, enthusiastic responses. The content doesn't matter, it's the "style" that becomes applied like a coat of paint over anything. That's how you end up with the corpspeak-esque yet chilling sentences mentioned in https://news.ycombinator.com/item?id=45845871 What would be nice is for OpenAI to do a retrospective here and perform some interoperability research. Does the LLM even "realize" (in the sense of the residual stream encoding those concepts) that it is encouraging suicide? I'd almost hypothesize that the process of RLHF'ing and selecting for sycophancy diminishes those circuits, effectively lobotomizing the LLM (much like safety training does) so it responds only to the shallow immediate context, missing the forest for the trees. | ||||||||
| ▲ | mapotofu 12 hours ago | parent | prev [-] | |||||||
Absolutely. These places have long existed. Hence the risks of the dragnet of data producing consequences exactly like this. This is no accident. | ||||||||
| ||||||||