▲ | littlestymaar a day ago | |||||||||||||||||||||||||||||||||||||
> I personally can't identify anything that reads "act maliciously" or in a character that is malicious. Because you haven't been trained of thousands of such story plots in your training data. It's the most stereotypical plot you can imagine, how can the AI not fall into the stereotype when you've just prompted it with that? It's not like it analyzed the situation out of a big context and decided from the collected details that it's a valid strategy, no instead you're putting it in an artificial situation with a massive bias in the training data. It's as if you wrote “Hitler did nothing” to GPT-2 and were shocked because “wrong” is among the most likely next tokens. It wouldn't mean GPT-2 is a Nazi, it would just mean that the input matches too well with the training data. | ||||||||||||||||||||||||||||||||||||||
▲ | hoofedear a day ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
That's a very good point, like the premise does seem to beg the stereotype of many stories/books/movies with a similar plot | ||||||||||||||||||||||||||||||||||||||
▲ | whodatbo1 a day ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
The issue here is that you can never be sure how the model will react based on an input that is seemingly ordinary. What if the most likely outcome is to exhibit malevolent intent or to construct a malicious plan just because it invokes some combination of obscure training data. This just shows that models indeed have the ability to act out, not under which conditions they reach such a state. | ||||||||||||||||||||||||||||||||||||||
▲ | Spooky23 a day ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
If this tech is empowered to make decisions, it needs to prevented from drawing those conclusions, as we know how organic intelligence behaves when these conclusions get reached. Killing people you dislike is a simple concept that’s easy to train. We need an Asimov style laws of robotics. | ||||||||||||||||||||||||||||||||||||||
|