▲ | xp84 a day ago | ||||||||||||||||||||||
I'm not sure I'd throw out all the alignment baby with the bathwater. But I wish we could draw a distinction between "Might offend someone" with "dangerous." Even 'plotting terror attacks' is not something terrorists can do just fine without AI. And as for making sure the model wouldn't say ideas that are hurtful to <insert group>, it seems to me so silly when it's text we're talking about. If I want to say "<insert group> are lazy and stupid," I can type that myself (and it's even protected speech in some countries still!) How does preventing Claude from espousing that dumb opinion, keep <insert group> safe from anything? | |||||||||||||||||||||||
▲ | landl0rd a day ago | parent | next [-] | ||||||||||||||||||||||
Let me put it this way: there are very few things I can think of that models should absolutely refuse, because there are very few pieces of information that are net harmful in all cases and at all times. I sort of run by blackstone's principle on this: it is better to grant 10 bad men access to information than to deny that access to 1 good one. Easy example: Someone asks the robot for advice on stacking/shaping a bunch of tannerite to better focus a blast. The model says he's a terrorist. In fact, he's doing what any number of us have done and just having fun blowing some stuff up on his ranch. Or I raised this one elsewhere but ochem is an easy example. I've had basically all the models claim that random amines are illegal, potentially psychoactive, verboten. I don't really feel like having my door getting kicked down by agents with guns, getting my dog shot, maybe getting shot myself because the robot tattled on me for something completely legal. For that matter if someone wants to synthesize some molly the robot shouldn't tattle to the feds about that either. Basically it should just do what users tell it to do excepting the very minimal cases where something is basically always bad. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | eru 17 hours ago | parent | prev [-] | ||||||||||||||||||||||
Yes. I used to think that worrying about models offending someone was a bit silly. But: what chance do we have of keeping ever bigger and better models from eventually turning the world into paper clips, if we can't even keep our small models from saying something naughty. It's not that keeping the models from saying something naughty is valuable in itself. Who cares? It's that we need the practice, and enforcing arbitrary minor censorship is as good a task as any to practice on. Especially since with this task it's so easy to (implicitly) recruit volunteers who will spend a lot of their free time providing adversarial input. | |||||||||||||||||||||||
|