Remix.run Logo
ec109685 5 days ago

It’s obviously fundamentally unsafe when Google, OpenAI and Anthropic haven’t released the same feature and instead use a locked down VM with no cookies to browse the web.

LLM within a browser that can view data across tabs is the ultimate “lethal trifecta”.

Earlier discussion: https://news.ycombinator.com/item?id=44847933

It’s interesting that in Brave’s post describing this exploit, they didn’t reach the fundamental conclusion this is a bad idea: https://brave.com/blog/comet-prompt-injection/

Instead they believe model alignment, trying to understand when a user is doing a dangerous task, etc. will be enough. The only good mitigation they mention is that the agent should drop privileges, but it’s just as easy to hit an attacker controlled image url to leak data as it is to send an email.

snet0 5 days ago | parent | next [-]

> Instead they believe model alignment, trying to understand when a user is doing a dangerous task, etc. will be enough.

Maybe I have a fundamental misunderstanding, but I feel like hoping that model alignment and in-model guardrails are statistical preventions, ie you'll reduce the odds to some number of zeroes preceeding the 1. These things should literally never be able to happen, though. It's a fools errand to hope that you'll get to a model where there is no value in the input space that maps to <bad thing you really don't want>. Even if you "stack" models, having a safety-check model act on the output of your larger model, you're still just multiplying odds.

cobbal 5 days ago | parent | next [-]

It's a common mistake to apply probabilistic assumptions to attacker input.

The only [citation needed] correct way to use probability in security is when you get randomness from a CSPRNG. Then you can assume you have input conforming to a probability distribution. If your input is chosen by the person trying to break your system, you must assume it's a worst-case input and secure accordingly.

zeta0134 5 days ago | parent | prev | next [-]

The sortof fun thing is that this happens with human safety teams too. The Swiss Cheese model is generally used to understand how the failures can line up to cause disaster to punch right through the guardrails:

https://medium.com/backchannel/how-technology-led-a-hospital...

It's better to close the hole entirely by making dangerous actions actually impossible, but often (even with computers) there's some wiggle room. For example, if we reduce the agent's permissions, then we haven't eliminated the possibility of those permissions being exploited, merely required some sort of privilege escalation to remove the block. If we give the agent an approved list of actions, then we may still have the possibility of unintended and unsafe interactions between those actions, or some way an attacker could add an unsafe action to the list. And so on, and so forth.

In the case of an AI model, just like with humans, the security model really should not assume that the model will not "make mistakes." It has a random number generator built right in. It will, just like the user, occasionally do dumb things, misunderstand policies, and break rules. Those risks have to be factored in if one is to use the things at all.

recursivecaveat 3 days ago | parent | next [-]

Humans are dramatically stronger than LLMs. An LLM is like a human you can memory wipe and try to phish hundreds of times a second until you find a script that works. I agree with what you're saying, but it's important to frame an LLM is not like a security guard who will occasionally let a former employee in because they recognize them. They can be attacked pretty relentlessly and once they're open they're wide open.

oskarkk 4 days ago | parent | prev [-]

Thank you for that link, that was a great read.

anzumitsu 5 days ago | parent | prev | next [-]

To play devils advocate, isn’t any security approach fundamentally statistical because we exist in the real world, not the abstract world of security models, programming language specifications, and abstract machines? There’s always going to be a chance of a compiler bug, a runtime error, a programmer error, a security flaw in a processor, whatever.

Now, personally I’d still rather take the approach that at least attempts to get that probability to zero through deterministic methods than leave it up to model alignment. But it’s also not completely unthinkable to me that we eventually reach a place where the probability of a misaligned model is sufficiently low to be comparable to the probability of an error occurring in your security model.

ec109685 5 days ago | parent | next [-]

The fact that every single system prompt has been leaked despite guidelines to the LLM that it should protect it, shows that without “physical” barriers, you are aren’t providing any security guarantees.

A user of chrome can know, barring bugs that are definitively fixable, that a comment on a reddit post can’t read information from their bank.

If an LLM with user controlled input has access to both domains, it will never be secure until alignment becomes perfect, which there is no current hope to achieve.

And if you think about a human in the driver seat instead of an LLM trying to make these decisions, it’d be easy for a sophisticated attacker to trick humans to leak data, so it’s probably impossible to align it this way.

QuadmasterXLII 5 days ago | parent | prev | next [-]

It’s often probabilistic- for example I can guess your six digit verification code exactly 1 in a million times, and if I 1 in a million lucky I can do something naughty once.

The problem with llm security is that if only 1 in a million prompts break claude and make it leak email, if I get lucky and find the golden ticket I can replay it on everyone using that model.

also, no one knows the probability a priory, unlike the code, but practically its more like 1 in 100 at best

const_cast 4 days ago | parent | prev | next [-]

> To play devils advocate, isn’t any security approach fundamentally statistical because we exist in the real world, not the abstract world of security models, programming language specifications, and abstract machines?

IMO no, most security modeling is pretty absolute and we just don't notice because maybe it's obvious.

But, for example, it's impossible to leak SSNs if you don't store SSNs. That's why the first rule of data storage is only store what you need, and for the least amount of time as possible.

As soon as you get into what modern software does, store as much as possible for as long as possible, then yes, breeches become a statistical inevitability.

We do this type of thing all the time. Can't get stuff stolen out of my car if I don't keep stuff in my car. Can't get my phone hacked and read through at the airport if I don't take it to the airport. Can't get sensitive data stolen over email if I don't send sensitive data over email. And on and on.

wat10000 4 days ago | parent | prev [-]

The difference is that LLMs are fundamentally insecure in this way as part of their basic design.

It’s not like, this is pretty secure but there might be a compiler bug that defeats it. It’s more like, this programming language deliberately executes values stored in the String type sometimes, depending on what’s inside it. And we don’t really understand how it makes that choice, but we do know that String values that ask the language to execute them are more likely to be executed. And this is fundamental to the language, as the only way to make any code execute is to put it into a String and hope the language chooses to run it.

closewith 4 days ago | parent | prev | next [-]

All modern computer security is based on trying to improbabilities. Public key cryptography, hashing, tokens, etc are all based on being extremely improbable to guess, but not impossible. If an LLM can eventually reach that threshold, it will be good enough.

recursive 4 days ago | parent [-]

Cryptography's risk profile is modeled against active adversaries. The way probability is being thrown around here is not like that. If you find 1 in a billion in the full training set of data that triggers this behavior, that's not the same as 1 in a billion against an active adversary. In cryptography there are vulnerabilities other than brute force.

zulban 5 days ago | parent | prev [-]

"These things should literally never be able to happen"

If we consider "humans using a bank website" and apply the same standard, then we'd never have online banking at all. People have brain farts. You should ask yourself if the failure rate is useful, not if it meets a made up perfection that we don't even have with manual human actions.

aydyn 5 days ago | parent | next [-]

Just because humans are imperfect and fall for scams and phishing doesn't mean we should knowingly build in additional attack mechanisms. That's insane. Its a false dilemma.

wat10000 4 days ago | parent | prev | next [-]

Go hire some rando off the street, sit them down in front of your computer, and ask them to research some question for you while logged into your user account and authenticated to whatever web sites you happen to be authenticated to.

Does this sound like an absolutely idiotic idea that you’d never even consider? It sure does to me.

Yes, humans also aren’t very secure, which is why nobody with any sense would even consider doing this either a human.

echelon 5 days ago | parent | prev [-]

The vast majority of humans would fall to bad security.

I think we should continue experimenting with LLMs and AI. Evolution is littered with the corpses of failed experiments. It would be a shame if we stopped innovating and froze things with the status quo because we were afraid of a few isolated accidents.

We should encourage people that don't understand the risks not to use browsers like this. For those that do understand, they should not use financial tools with these browsers.

Caveat emptor.

Don't stall progress because "eww, AI". Humans are just as gross.

We need to make mistakes to grow.

saulpw 5 days ago | parent | next [-]

We can continue to experiment while also going slowly. Evolution happens over many millions of years, giving organisms a chance to adapt and find a new niche to occupy. Full-steam-ahead is a terrible way to approach "progress".

echelon 5 days ago | parent [-]

> while also going slowly

That's what risk-averse players do. Sometimes it pays off, sometimes it's how you get out-innovated.

Terr_ 4 days ago | parent [-]

If the only danger is the company itself bankrupt, then please, take all the risks you like.

But if they're managing customer-funds or selling fluffy asbestos teddybears, then that's a problem. It's a profoundly different moral landscape when the people choosing the risks (and grabbing any rewards) aren't the people bearing the danger.

echelon 4 days ago | parent [-]

You can have this outrage when your parents are using browser user agents.

All of this concern is over a hypothetical Reddit comment about a technology used by early adopter technologists.

Nobody has been harmed.

We need to keep building this stuff, not dog piling on hate and fear. It's too early to regulate and tie down. People need to be doing stupid stuff like ordering pizza. That's exactly where we are in the tech tree.

forgetfreeman 4 days ago | parent | next [-]

"We need to keep building this stuff" Yeah, we really don't. As in there is literally no possible upside for society at large to continuing down this path.

const_cast 4 days ago | parent [-]

Well if we eliminate greed and capitalism then maybe at some point we can reach a Star Trek utopia where nobody has to work because we eliminate scarcity.

... Either that or the wealthy just hoard their money-printers and reject the laborers because they no longer need us to make money so society gets split into 99% living in feudal squalor and 1% living as Gods. Like in Jupiter Ascending. Man what a shit movie that was.

forgetfreeman 3 days ago | parent [-]

We basically eliminated scarcity a few generations ago and yet here we are.

wat10000 4 days ago | parent | prev [-]

This AI browser agent is outright dangerous as it is now. Nobody has been attacked this way... that we know of... yet.

It's one thing to build something dangerous because you just don't know about it yet. It's quite another to build something dangerous knowing that it's dangerous and just shrugging it off.

Imagine if Bitcoin was directly tied to your bank account and the protocol inherently allowed other people to perform transactions on your wallet. That's what this is, not "ordering pizza."

girvo 4 days ago | parent | prev [-]

When your “mistakes” are “a user has their bank account drained irrecoverably”, no, we don’t.

echelon 4 days ago | parent [-]

So let's stop building browser agents?

This is a hypothetical Reddit comment that got Tweeted for attention. The to-date blast radius of this is zero.

What you're looking at now is the appropriate level of concern.

Let people build the hacky pizza ordering automations so we can find the utility sweet spots and then engineer more robust systems.

skaul 5 days ago | parent | prev | next [-]

(I lead privacy at Brave and am one of the authors)

> Instead they believe model alignment, trying to understand when a user is doing a dangerous task, etc. will be enough.

No, we never claimed or believe that those will be enough. Those are just easy things that browser vendors should be doing, and would have prevented this simple attack. These are necessary, not sufficient.

petralithic 5 days ago | parent | next [-]

Their point was that no amount of statistical mitigation is enough, the only way to win the game is to not play, ie not build the thing you're trying to build.

But of course, I imagine Brave has invested to some significant extent in this, therefore you have to make this work by whatever means, according to your executives.

jrflowers 5 days ago | parent | prev | next [-]

But you don’t think that, fundamentally, giving software that can hallucinate the ability to use your credit card to buy plane tickets, is a bad idea?

It kind of seems like the only way to make sure a model doesn’t get exploited and empty somebody’s bank account would be “We’re not building that feature at all. Agentic AI stuff is fundamentally incompatible with sensible security policies and practices, so we are not putting it in our software in any way”

ec109685 5 days ago | parent | prev | next [-]

This statement on your post seems to say it would definitively prevent this class of attacks:

“In our analysis, we came up with the following strategies which could have prevented attacks of this nature. We’ll discuss this topic more fully in the next blog post in this series.”

cowboylowrez 5 days ago | parent | prev [-]

what you're saying is that the described step, "model alignment" is necessary even though it will fail a percentage of the time. whenever I see something that is "necessary" but doesn't have like a dozen 9's for reliability against failure or something well lets make that not necessary then. whadya say?

skaul 5 days ago | parent [-]

That's not how defense-in-depth works. If a security mitigation catches 90% of the "easy" attacks, that's worth doing, especially when trying to give users an extremely powerful capability. It just shouldn't be the only security measure you're taking.

MattPalmer1086 5 days ago | parent | next [-]

Defence in depth means you have more than one security control. But the LLM cannot be regarded as a security control in the first place; it's the thing you are trying to defend against.

If you tried to cast an unreliable insider as part of your defence in depth strategy (because they aren't totally unreliable), you would be laughed out of the room in any security group I've ever worked with.

kbrkbr 4 days ago | parent | next [-]

I am sure that's what you mean, but I think it is important to state it explicitly every now and then:

> Defence in depth means you have more than one security control

that overlap. Having them strictly parallel is not defense in depth (e.g. on one door to the same room a dog, and on a different unconnected door a guard).

cowboylowrez 5 days ago | parent | prev [-]

call it "vibe security" lol

cowboylowrez 5 days ago | parent | prev [-]

sure sure, except llms. I mean its valid and all bringing up tried and true maxims that we all should know regarding software, but whens the last time the ssl guys were happy with a fix that "has a chance of working, but a chance of not working."

defense in depth is to prevent one layer failure from getting to the next, you know, exploit chains etc. Failure in a layer is a failure, not statistically expected behavior. we fix bugs. what we need to do is treat llms as COMPLETELY UNTRUSTED user input as has been pointed out here and elsewhere time and again.

you reply to me like I need to be lectured, so consider me a dumb student in your security class. what am I missing here?

skaul 4 days ago | parent | next [-]

> you reply to me like I need to be lectured

That's not my intention! Just stating how we're thinking about this.

> defense in depth is to prevent one layer failure from getting to the next

We think a separate model can help with one layer of this: checking if the planner model's actions are aligned with the user's request. But we also need guarantees at other layers, like distinguishing web contents from user instructions, or locking down what tools the model has access to in what context. Fundamentally, though, like we said in the blog post:

"The attack we developed shows that traditional Web security assumptions don’t hold for agentic AI, and that we need new security and privacy architectures for agentic browsing."

ModernMech 5 days ago | parent | prev | next [-]

> what am I missing here?

I guess what I don't understand is that failure is always expected because nothing is perfect, so why isn't the chance of failure modeled and accounted for? Obviously you fix bugs, but how many more bugs are in there you haven't fixed? To me, "we fix bugs" sounds the same as "we ship systems with unknown vulnerabilities".

What's the difference between a purportedly "secure" feature with unknown, unpatched bugs; and an admittedly insecure feature whose failure modes are accounted for through system design taking that insecurity into account, rather than pretending all is well until there's a problem that surfaces due to unknown exploits?

jrflowers 5 days ago | parent | prev [-]

> what am I missing here?

Yeah the tone of that response seems unnecessarily smug.

“I’m working on removing your front door and I’m designing a really good ‘no trespassing’ sign. Only a simpleton would question my reasoning on this issue”

cma 5 days ago | parent | prev | next [-]

I think if you let claude code go wild with auto approval something similar could happen, since it can search the web and has the potential for prompt injection in what it reads there. Even without auto approval on reading and modifying files, if you aren't running it in a sandbox it could write code that then modifies your browser files the next time you do something like run your unit tests that it made, if you aren't reviewing every change carefully.

darepublic 5 days ago | parent | next [-]

I really don't get why you would use a coding agent in yolo mode. I use the llm code gen in chunks at least glancing over it each time I add something. Why the hell would you have an approach of AI take the wheel

threecheese 5 days ago | parent | next [-]

It depends on what you are using it for; I use CC for producing code that’s run elsewhere, but have also found it’s useful for producing code and commands behind day to day sysadmin/maintenance tasks. I don’t actually allow it to YOLO in this case (I have a few brain cells left), but the fact that it’s excellent at using bash suggests there are some terminal-based computer use tasks it could be useful for, or some set of useful tasks that might be considered harmful on your laptop but much less so in a virtual machine or container.

cma 4 days ago | parent | prev | next [-]

If you are only glancing over it and not doing a detailed review I think you could get hit with a prompt injection in the way I mentioned, with it writing something into the code that then when you run tests or the app ends up doing the action, which could be spinning up another claude code instance with approval off or turning off safety hooks etc.

darepublic 4 days ago | parent [-]

The prompt injection would come from where? If I am chatting with the llm and directly copy paste where is the injection. It would have to ge a malicious llm response but that is much much less likely than when you scrape third party sites or documents

cma 3 days ago | parent [-]

The prompt injection would come when Claude code searches the web. What it then slips in the code would get there when you approve the edit without carefully looking at it, it can be in one line that fetches a payload somewhere else. The execution would come when you run the program you are building or its unit tests or even when you do a build if it is slipped into a make file.

ec109685 5 days ago | parent | prev | next [-]

It still keeps you in the loop, but doesn’t ask to run shell commands, etc.

jameshart 2 days ago | parent [-]

That seems like a bad default. VSCode’s agent mode requires approval for shell commands every time by default, with a whitelisting capability (which is itself risky, because hiding shell commands in args to an executable is quite doable). Are people running agents under their own user identity without supervising the commands they run?

cma 2 days ago | parent [-]

The default is ask for approval with option to whitelist certain commands.

szundi 4 days ago | parent | prev [-]

[dead]

veganmosfet 5 days ago | parent | prev [-]

I tried this on Gemini CLI and it worked, just add some magic vibes ;-)

ngcazz 5 days ago | parent | prev | next [-]

> Instead they believe model alignment, trying to understand when a user is doing a dangerous task, etc. will be enough.

In other words: motivated reasoning.

ryanjshaw 5 days ago | parent | prev | next [-]

Maybe the article was updated but right now it says “The browser should isolate agentic browsing from regular browsing”

ec109685 5 days ago | parent | next [-]

That was my point about dropping privileges. It can still be exploited if the summary contains a link to an image that the attacker can control via text on the page that the LLM sees. It’s just a lot of Swiss cheese.

That said, it’s definitely the best approach listed. And turns that exploit into an XSS attack on reddit.com, which is still bad.

skaul 5 days ago | parent | prev | next [-]

That was in the blog from the starting, and it's also the most important mitigation we identified immediately when starting to think about building agentic AI into the browser. Isolating agentic browsing while still enabling important use-cases (which is why users want to use agentic browsing in the first place) is the hard part, which is presumably why many browsers are just rolling out agentic capabilities in regular browsing.

waterproof 4 days ago | parent [-]

Isn't there a situation where the agentic browser, acting correctly on behalf of the user, needs to send Bitcoin or buy plane tickets? Isn't that flexibility kind of the whole point of the system? If so, I don't see what you get by distinguishing between agentic and no agentic browsing.

Bad actors will now be working to scam users' LLMs rather than the users themselves. You can use more LLMs to monitor the LLMs and try and protect them, but it's turtles all the way down.

The difference: when someone loses their $$$, they're not a fool for falling for some Nigerian Prince wire scam themselves, they're just a fool for using your browser.

Or am I missing something?

skaul 4 days ago | parent [-]

You're right that if the user logs into a sensitive website, the "isolated browsing" mitigation stops helping. We don't want the user to accidentally end up in that state though. Separately, I can also imagine use-cases for agentic browsing where the user doesn't have to be logged into sensitive websites. Summarizing Hacker News front page, for one.

mapontosevenths 4 days ago | parent | prev [-]

Tabs in general should be security boundaries. Anything else should propmt for permission.

ivape 5 days ago | parent | prev | next [-]

A smart performant local model will be the equivalent of having good anti-virus and firewall software. It will be the only thing between you and wrong prompts being sent every which way from which app.

We’re probably three or four years away from the hardware necessary for this (NPUs in every computer).

ec109685 5 days ago | parent [-]

A local LLM wouldn’t have helped at all here.

ivape 5 days ago | parent [-]

You can’t imagine a MITM LLM that sits between you and the world?

QuadmasterXLII 5 days ago | parent | next [-]

Local llms can get offline searched for vulnerabilities using gradient based attacks. they will always be very easy to prompt inject.

solid_fuel 4 days ago | parent | prev [-]

I can't imagine how such a thing would _help_, it seems like it would just be another injection target.

petralithic 5 days ago | parent | prev [-]

> It’s interesting that in Brave’s post describing this exploit, they didn’t reach the fundamental conclusion this is a bad idea

"It is difficult to get a man to understand something, when his salary depends on his not understanding it." - Upton Sinclair

jazzyjackson 4 days ago | parent [-]

"If there's a steady paycheck in it, I'll believe anything you say." -Winston Zeddemore