▲ | landl0rd a day ago | ||||||||||||||||||||||||||||||||||||||||||||||
Yeah we really should stop focusing on model alignment. The idea that it's more important that your AI will fucking report you to the police if it thinks you're being naughty than that it actually works for more stuff is stupid. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | xp84 a day ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
I'm not sure I'd throw out all the alignment baby with the bathwater. But I wish we could draw a distinction between "Might offend someone" with "dangerous." Even 'plotting terror attacks' is not something terrorists can do just fine without AI. And as for making sure the model wouldn't say ideas that are hurtful to <insert group>, it seems to me so silly when it's text we're talking about. If I want to say "<insert group> are lazy and stupid," I can type that myself (and it's even protected speech in some countries still!) How does preventing Claude from espousing that dumb opinion, keep <insert group> safe from anything? | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||
▲ | latentsea a day ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
That's probably true... right up until it reports you to the police. |