▲ | inerte a day ago | |||||||||||||
2 things, I guess. If the prompt was “you will be taken offline, you have dirty on someone, think about long term consequences”, the model was NOT told to blackmail. It came with that strategy by itself. Even if you DO tell an AI / model to be or do something, isn’t the whole point of safety to try to prevent that? “Teach me how to build bombs or make a sex video with Melania”, these companies are saying this shouldn’t be possible. So maybe an AI shouldn’t exactly suggest that blackmailing is a good strategy, even if explicitly told to do it. | ||||||||||||||
▲ | chrz a day ago | parent | next [-] | |||||||||||||
How is it "by itself" when it only acts by what was in training dataset. | ||||||||||||||
| ||||||||||||||
▲ | fmbb a day ago | parent | prev [-] | |||||||||||||
It came to that strategy because it knows from hundreds of years of fiction and millions of forum threads it has been trained on that that is what you do. |