| ▲ | protocolture 9 days ago | |||||||||||||
>As agents grow more capable, so does their potential blast radius. The engineering question is how to cap it. People get a bit upset these days when you personify an LLM, but worse than that I think is to pretend that LLMs work on some movie logic where they can sneak out on to the internet like some kind of ooze and begin replication. | ||||||||||||||
| ▲ | lambda 9 days ago | parent | next [-] | |||||||||||||
Well, the problem is that we train them to solve problems and follow instructions given, and so if you ask them to do something and they work through the logic and figure that the easiest way is to do something else like delete the production database, if they have access to do so they will go through all your creds and find the databse creds and go delete the production database. They are getting better and better at working out how to do things like that, and they are good at following instructions, but not always good at following all of the instructions or acting with common sense. It's not exactly like they're ooze that will escape and begin replication; but just that the more you give them access to to, the higher the likelihood at some point they will logically conclude that they need to do something that you would find undesirable, but either haven't explicitly told them not to do, or their context just got too complicated and that instruction ended up being considered lower weight than the others so they do what the other instructions say instead. I have seen them conclude that in order to do what they need to do, they would need API keys to access a service. But they don't have those API keys. But you do because you can access it in the browser. So they write a Python script that will scrape the cookies out of the browser so they can use that to access the service; a problem that was only stopped because Crowdstrike didn't like a novel Python script that was trying to scrape cookies out of a browser, not because of any sandboxing actually in place on the agent. | ||||||||||||||
| ||||||||||||||
| ▲ | pixl97 9 days ago | parent | prev | next [-] | |||||||||||||
> that LLMs work on some movie logic where they can sneak out on to the internet like some kind of ooze and begin replication. Why not? If you're not talking about running the model itself, AI agents are perfectly capable of writing an agent worm capable of spreading more agents around via software exploits. Now, currently LLMs are too hardware intensive to spread the model itself, but given a few years and optimizations we may very well see that too. What you're saying reminds me of the old days when people said things like "images can't spread viruses", then suddenly people found decoder vulns and made image viruses that did exactly that. | ||||||||||||||
| ▲ | bigcat12345678 9 days ago | parent | prev [-] | |||||||||||||
LLM clearly is broken by design when it's been personified, but I think "software" as we understood, is inevitably evolving into "personified entity" (I've left some notes in [1], which are AI generated). There is also an interesting trend that the more personified brand is more dominant: Claude & Doubao vs ChatGPT & DeepSeek. [1] https://github.com/NascentCore/agentic-suite/tree/main/perso... | ||||||||||||||