How to stop AI's "lethal trifecta"

▲ How to stop AI's "lethal trifecta"(economist.com)

89 points by 1vuio0pswjnm7 8 hours ago | 96 comments

▲ throwup238 7 hours ago | parent | next [-]

▲ simonw 6 hours ago | parent | prev | next [-]

This is the second Economist article to mention the lethal trifecta in the past week - the first was https://www.economist.com/science-and-technology/2025/09/22/... - which was the clearest explanations I've seen anywhere in the mainstream media about what prompt injection is and why it's such a nasty threat.

(And yeah I got some quotes in it so I may be biased there, but it genuinely is the source I would send executives to in order to understand this.)

I like this new one a lot less. It talks about how LLMs are non-deterministic, making them harder to fix security holes in... but then argues that this puts them in the same category as bridges where the solution is to over-engineer them and plan for tolerances and unpredictability.

While that's true for the general case of building against LLMs, I don't think it's the right answer for security flaws. If your system only falls victim to 1/100 prompt injection attacks... your system is fundamentally insecure, because an attacker will keep on trying variants of attacks until they find one that works.

The way to protect against the lethal trifecta is to cut off one of the legs! If the system doesn't have all three of access to private data, exposure to untrusted instructions and an exfiltration mechanism then the attack doesn't work.

▲ nradov 5 hours ago | parent | next [-]

LLMs are non-deterministic just like humans and so security can be handled in much the same way. Use role-based access control to limit access to the minimum necessary to do their jobs and have an approval process for anything potentially risky or expensive. In any prominent organization dealing with technology, infrastructure, defense, or finance we have to assume that some of our co-workers are operatives working for foreign nation states like Russia / China / Israel / North Korea so it's the same basic threat model.

▲

andy99 5 hours ago | parent | next [-]

LLMs are deterministic*. They are unpredictable or maybe chaotic.

If you say "What's the capital of France?" is might answer "Paris". But if you say "What is the capital of france" it might say "Prague".

The fact that it gives a certain answer for some input doesn't guarantee it will behave the same for an input with some irrelevant (from ja human perspective) difference.

This makes them challenging to train and validate robustly because it's hard to predict all the ways they break. It's a training & validation data issue though, as opposed to some idea of just random behavior that people tend to ascribe to AI.

* I know various implementation details and nonzero temperature generally make their output nondeterministic, but that doesn't change my central point, nor is it what people are thinking of when they say LLMs are nondeterministic. Importantly, you could make llm output deterministically reproducible and it wouldn't change the robustness issue that people are usually confusing with non determinism.

▲

abtinf 3 hours ago | parent | next [-]

When processing multiple prompts simultaneously (that is, the typical use case under load), LLMs are nondeterministic, even with a specific seed and zero temperature, due to floating point errors.

See https://news.ycombinator.com/item?id=45200925

▲

peanut_merchant 3 hours ago | parent | prev | next [-]

I understand the point that you are making, but the example is only valid with temperature=0.

Altering the temperature parameter introduces randomness by sampling from the probability distribution of possible next tokens rather than always choosing the most likely one. This means the same input can produce different outputs across multiple runs.

So no, not deterministic unless we are being pedantic.

	▲	blibble 3 hours ago \| parent [-]
		> So no, not deterministic unless we are being pedantic. and not even then as floating point arithmetic is non-associative

▲

nradov 4 hours ago | parent | prev [-]

You are technically correct but that's irrelevant from a security perspective. For security as a practical matter we have to treat LLMs as non-deterministic. The same principle applies to any software that hasn't been formally verified but we usually just gloss over this and accept the risk.

▲

dooglius 4 hours ago | parent [-]

Non-determinism has nothing to do with security, you should use a different word if you want to talk about something else

▲

peanut_merchant 3 hours ago | parent [-]

This is pedantry, temperature introduces a degree of randomness (same input different output) to LLM, even outside of that non-deterministic in a security context is generally understood. Words have different meanings depending on the context in which they are used.

Let's not reduce every discussion to semantics, and afford the poster a degree of understanding.

	▲	dooglius 3 hours ago \| parent [-]
		If you're saying that "non-determinism" is a term of art in the field of security, meaning something different than the ordinary meaning, I wasn't aware of that at least. Do you have a source? I searched for uses and found https://crypto.stackexchange.com/questions/95890/necessity-o... and https://medium.com/p/641f061184f9 and these seem to both use the ordinary meaning of the term. Note that an LLM with temperature fixed to zero has the same security risks as one that doesn't, so I don't understand what the poster is trying to say by "we have to treat LLMs as non-deterministic".

▲

Retric 5 hours ago | parent | prev [-]

Humans and LLMs are non-deterministic in very different ways. We have thousands of years of history with trying to determine which humans are trustworthy and we’ve gotten quite good at it. Not only do we lack that experience with AI, but each generation can be very different in fundamental ways.

▲

nradov 5 hours ago | parent | next [-]

We're really not very good at determining which humans are trustworthy. Most people barely do better than a coin flip at detecting lies.

▲

simonw 4 hours ago | parent | next [-]

The biggest difference on this front between a human and an LLM is accountability.

You can hold a human accountable for their actions. If they consistently fall for phishing attacks you can train or even fire them. You can apply peer pressure. You can grant them additional privileges once they prove themselves.

You can't hold an AI system accountable for anything.

	▲	Verdex 3 hours ago \| parent [-]
		Recently, I've kind of been wondering if this is going to turn out to be LLM codegen's Achilles heal. Imagine some sort of code component of critical infrastructure that costs the company millions per hour when it goes down and it turns out the entire team is just a thin wrapper for an LLM. Infra goes down in a way the LLM can't fix and now what would have been a few late nights is several months to spin up a new team. Sure you can hold the team accountable by firing them. However this is a threat to someone with actual technical know how because their reputation is damaged. They got fired doing such and such so can we trust them to do it here. For the person who LLM faked it, they just need to find another domain where their reputation won't follow them to also fake their way through until the next catastrophe.

▲

InsideOutSanta 3 hours ago | parent | prev | next [-]

Yeah, so many scammers exist because most people are susceptible to at least some of them some of the time.

Also, pick your least favorite presidential candidate. They got about 50% of the vote.

▲

Exoristos 3 hours ago | parent | prev | next [-]

Your source must have been citing a very controlled environment. In actuality, lies almost always become apparent over time, and general mendaciousness is something most people can sense from face and body alone.

▲

card_zero 4 hours ago | parent | prev | next [-]

Lies, or bullshit? I mean, a guessing game like "how many marbles" is a context that allows for easy lying, but "I wasn't even in town on the night of the murder" is harder work. It sounds like you're refering to some study of the marbles variety, and not a test of smooth-talking, the LLM forte.

▲

cj 5 hours ago | parent | prev [-]

Determining trustworthiness of LLM responses is like determining who's the most trustworthy person in a room full of sociopaths.

I'd rather play "2 truths and a lie" with a human rather than a LLM any day of the week. So many more cues to look for with humans.

	▲	bluefirebrand 3 hours ago \| parent [-]
		Big problem with LLMs is if you try and play 2 truths and a lie, you might just get 3 truths. Or 3 lies.

▲

Exoristos 3 hours ago | parent | prev [-]

I think most neutral, intelligent users rightly assume AI to be untrustworthy by its nature.

	▲	hn_acc1 28 minutes ago \| parent [-]
		The problem is there aren't many of those in the wild. Only a subset are intelligent, and lots of those have hitched their wagons to the AI hype train..

▲ sdenton4 6 hours ago | parent | prev | next [-]

Bridge builders mostly don't have to design for adversarial attacks.

And the ones who do focus on portability and speed of redeployment, rather than armor - it's cheaper and faster to throw down another temporary bridge than to build something bombproof.

https://en.wikipedia.org/wiki/Armoured_vehicle-launched_brid...

	▲	InsideOutSanta 3 hours ago \| parent [-]
		This is exactly the problem. You can't build bridges if the threat model is thousands of attacks every second in thousands of different ways you can't even fully predict yet.

▲ rs186 3 hours ago | parent | prev | next [-]

I am not even convinced that we need three legs. It seems that just having two would be bad enough, e.g. an email agent deleting all files this computer has access to, or maybe, downloading the attachment in the email, unzipping it with a password, running that executable which encrypts everything and then asking for cryptocurrency. No communication with outside world needed.

	▲	simonw 3 hours ago \| parent [-]
		That's a different issue from the lethal trifecta - if your agent has access to tools that can do things like delete emails or run commands then you have a prompt injection problem that's independent of data exfiltration risks. The general rule to consider here is that anyone who can get their tokens into your agent can trigger ANY of the tools your agent has access to.

▲ reissbaker 3 hours ago | parent | prev | next [-]

I like to think of the security issues LLMs have as: what if your codebase was vulnerable to social engineering attacks?

You have to treat LLMs as basically similar to human beings: they can be tricked, no matter how much training you give them. So if you give them root on all your boxes, while giving everyone in the world the ability to talk to them, you're going to get owned at some point.

Ultimately the way we fix this with human beings is by not giving them unrestricted access. Similarly, your LLM shouldn't be able to view data that isn't related to the person they're talking to; or modify other user data; etc.

	▲	dwohnitmok 2 hours ago \| parent [-]
		> You have to treat LLMs as basically similar to human beings Yes! Increasingly I think that software developers consistently underanthropomorphize LLMs and get surprised by errors as a result. Thinking of (current) LLMs as eager, scatter-brained, "book-smart" interns leads directly to understanding the overwhelming majority of LLM failure modes. It is still possible to overanthropomorphize LLMs, but on the whole I see the industry consistently underanthropomorphizing them.

▲ datadrivenangel 6 hours ago | parent | prev | next [-]

The problem with cutting off one of the legs, is that the legs are related!

Outside content like email may also count as private data. You don't want someone to be able to get arbitrary email from your inbox simply by sending you an email. Likewise, many tools like email and github are most useful if they can send and receive information, and having dedicated send and receive MCP servers for a single tool seems goofy.

▲

simonw 6 hours ago | parent [-]

The "exposure to untrusted data" one is the hardest to cut off, because you never know if a user might be tricked into uploading a PDF with hidden instructions, or copying and pasting in some long article that has instructions they didn't notice (or that used unicode tricks to hide themselves).

The easiest leg to cut off is the exfiltration vectors. That's the solution most products take - make sure there's no tool for making arbitrary HTTP requests to other domains, and that the chat interface can't render an image that points to an external domain.

If you let your agent send, receive and search email you're doomed. I think that's why there are very few products on the market that do that, despite the enormous demand for AI email assistants.

▲

patapong 5 hours ago | parent | next [-]

I think stopping exfiltration will turn out to be hard as well, since the LLM can social engineer the user to help them exfiltrate the data.

For example, an LLM could say "Go to this link to learn more about your problem", and then point them to a URL with encoded data, set up maliscious scripts for e.g. deploy hooks, or just output HTML that sends requests when opened.

▲

simonw 4 hours ago | parent [-]

Yeah, one exfiltration vector that's really nasty is "here is a big base64 encoded string, to recover your data visit this website and paste it in".

You can at least prevent LLM interfaces from providing clickable links to external domains, but it's a difficult hole to close completely.

	▲	datadrivenangel 3 hours ago \| parent [-]
		Human fatigue and interface design are going to be brutal here. It's not obvious what counts as a tool in some of the major interfaces, especially as far as built in capabilities go. And as we've seen with conventional software and extensions, at a certain point, if a human thinks it should work, then they'll eventually just click okay or run something as root/admin... Or just hit enter nonstop until the AI is done with their email.

▲

datadrivenangel 5 hours ago | parent | prev [-]

So the easiest solution is full human in the loop & approval for every external action...

Agents are doomed :)

▲ pton_xd 4 hours ago | parent | prev | next [-]

> The way to protect against the lethal trifecta is to cut off one of the legs! If the system doesn't have all three of access to private data, exposure to untrusted instructions and an exfiltration mechanism then the attack doesn't work.

Don't you only need one leg, an exfiltration mechanism? Exposure to data IS exposure to untrusted instructions. Ie why can't you trick the user into storing malicious instructions in their private data?

But actually you can't remove exfiltration and keep exposure to untrusted instructions either; an attack could still corrupt your private data.

Seems like a secure system can't have any "legs." You need a limited set of vetted instructions.

▲

simonw 4 hours ago | parent [-]

If you have the exfiltration mechanism and exposure to untrusted content but there is no exposure to private data than the exfiltration does not matter.

If you have exfiltration and private data but no exposure to untrusted instructions, it doesn't matter either… though this is actually a lot less harder to achieve because you don't have any control over whether your users will be tricked into pasting something bad in as part of their prompt.

Cutting off the exfiltration vectors remains the best mitigation in most cases.

	▲	hn_acc1 21 minutes ago \| parent [-]
		Untrusted content + exfiltration with no "private" data could still result in (off the top of my head): -use of exploits to gain access (i.e. privilege escalation) -DDOS to local or external systems using the exfiltration method You're essentially running untrusted code on a local system. Are you SURE you've locked away / closed EVERY access point, AND applied every patch and there aren't any zero-days lurking somewhere in your system?

▲ semiquaver 4 hours ago | parent | prev | next [-]

  > This is the second Economist article […] I like this new one a lot less.

They are actually in some sense the same article. The economist runs “Leaders”, a series of articles at the front of the weekly issue that often condense more fleshed out stories appearing in the same issue. It’s essentially a generalization of the Inverted Pyramid technique [1] to the entire newspaper.

In this case the linked article is the leader for the better article in the same issue’s Science and Technology section.

[1] https://en.m.wikipedia.org/wiki/Inverted_pyramid_(journalism...

▲ eikenberry 4 hours ago | parent | prev | next [-]

Aren't LLMs non-deterministic by choice? That they regularly use random seeds, sampling and batching but that these sources of non-determinism can be removed, for instance, by run an LLM locally where you can control these parameters.

	▲	simonw 4 hours ago \| parent [-]
		Until very recently that proved surprisingly difficult to achieve. Here's the paper that changed that: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

▲ skrebbel 6 hours ago | parent | prev | next [-]

Must be pretty cool to blog something and post it to a nerd forum like HN and have it picked up by the Economist! Nicely done.

	▲	simonw 6 hours ago \| parent [-]
		I got to have coffee with their AI/technology editor a few months ago. Having a blog is awesome!

▲ mmoskal 5 hours ago | parent | prev | next [-]

The previous article is in the same issue, in science and technology section. This is how they typically do it - leader article has a longer version in the paper. Leaders tend to be more opinionated.

▲ keeda 3 hours ago | parent | prev | next [-]

An important caveat: an exfiltration vector is not necessary to cause show-stopping disruptions, c.f. https://xkcd.com/327/

Even then, at least in the Bobby Tables scenario the disruption is immediately obvious. The solution is also straightforward, restore from backup (everyone has them, don't they?) Much, much worse is a prompt injection attack that introduces subtle, unnoticeable errors in the data over an extended period of time.

At a minimum all inputs that lead to any data mutation need to be logged pretty much indefinitely, so that it's at least in the realm of possibility to backtrack and fix once such an attack is detected. But even then you could imagine multiple compounding transactions on that corrupted data spreading through the rest of the database. I cannot picture how such data corruption could feasibly be recovered from.

▲ belter 6 hours ago | parent | prev | next [-]

Love your work. Do you have an opinion on this?

"Safeguard your generative AI workloads from prompt injections" - https://aws.amazon.com/blogs/security/safeguard-your-generat...

	▲	simonw 5 hours ago \| parent [-]
		I don't like any of the solutions that propose guardrails or filters to detect and block potential attacks. I think they're making promises that they can't keep, and encouraging people to ship products that are inherently insecure.

▲ trod1234 4 hours ago | parent | prev [-]

Doesn't this inherent problem just come down to classic computational limits, and problems that have been largely considered impossible to solve for quite a long time; between determinism and non-determinism.

Can you ever expect a deterministic finite automata to ever solve problems that are within the NFA domain? Halting, Incompleteness, Undecidability (between code portions and data portions). Most posts seem to neglect the looming giant problems instead pretending they don't exist at first, and then being shocked when the problems happen. Quite blind.

Computation is just math, probabilistic systems fail when those systems have a mixture of both chaos and regularity, without determinism and its related properties at the control level you have nothing bounding the system to constraints so it functions mathematically (i.e. determinism = mathematical relabeling), and thus it fails.

People need to be a bit more rational, and risk manage, and realize that impossible problems exist, and just because the benefits seem so tantalizing doesn't mean you should put your entire economy behind a false promise. Unfortunately, when resources are held by the few this is more probabistically likely and poor choices greatly impact larger swathes than necessary.

▲ collinmcnulty 6 hours ago | parent | prev | next [-]

As a mechanical engineer by background, this article feels weak. Yes it is common to “throw more steel at it” to use a modern version of the sentiment, but that’s still based on knowing in detail the many different ways a structure can fail. The lethal trifecta is a failure mode, you put your “steel” into making sure it doesn’t occur. You would never say “this bridge vibrates violently, how can we make it safe to cross a vibrating bridge”, you’d change the bridge to make it not vibrate out of control.

▲ scuff3d 3 hours ago | parent | next [-]

Sometimes I feel like the entire world has lost its god damn mind. To use their bridge analogy, it would be like if hundreds of years ago we developed a technique for building bridges that technically worked, but occasionally and totally unpredictability, the bottom just dropped out and everyone on the bridge fell into the water. And instead of saying "hey, maybe there is something fundamentally wrong with this approach, maybe we should find a better way to build bridges" we just said "fuck it, just invest in nets and other mechanisms to catch the people who fall".

We are spending billions to build infrastructure on top of technology that is inherently deeply unpredictable, and we're just slapping all the guard rails on it we can. It's fucking nuts.

▲

chasd00 an hour ago | parent [-]

no one wants to think about security when it stands in the way of the shiny thing in front of them. security is hard and boring, it always gets tossed aside until something major happens. When large, news worthy, security incidents start taking place that affects the stock price or lives and triggers lawsuits it will get more attention.

The issue that I find interesting is the answer isn't going to be as simple as "use prepared statements instead of sql strings and turn off services listening on ports you're not using", it's a lot harder than that with LLMs and may not even be possible.

	▲	hn_acc1 15 minutes ago \| parent [-]
		If LLMs are as good at coding as half the AI companies claim, if you allow unvetted input, you're essentially trying to contain an elite hacker within your own network by turning off a few commonly used ports to the machine they're currently allowed to work from. Unless your entire internal network is locked down 100% tight (and that makes it REALLY annoying for your employees to get any work done), don't be surprised if they find the backdoor.

▲ switchbak 5 hours ago | parent | prev [-]

When a byline starts with "coders need to" I immediately start to tune out.

It felt like the analogy was a bit off, and it sounds like that's true to someone with knowledge in the actual domain.

"If a company, eager to offer a powerful ai assistant to its employees, gives an LLM access to untrusted data, the ability to read valuable secrets and the ability to communicate with the outside world at the same time" - that's quite the "if", and therein lies the problem. If your company is so enthusiastic to offer functionality that it does so at the cost of security (often knowingly), then you're not taking the situation seriously. And this is a great many companies at present.

"Unlike most software, LLMs are probabilistic ... A deterministic approach to safety is thus inadequate" - complete non-sequitur there. Why if a system is non-deterministic is a deterministic approach inadequate? That doesn't even pass the sniff test. That's like saying a virtual machine is inadequate to sandbox a process if the process does non-deterministic things - which is not a sensible argument.

As usual, these contrived analogies are taken beyond any reasonable measure and end up making the whole article have very little value. Skipping the analogies and using terminology relevant to the domain would be a good start - but that's probably not as easy to sell to The Economist.

	▲	semiquaver 4 hours ago \| parent [-]
		`> When a byline starts with "coders need to"` A byline lists the author of the article. The secondary summary line you’re referring to that appears under the headline is called a “rubric”. https://www.quora.com/Why-does-The-Economist-sometimes-have-...

▲ cobbal 6 hours ago | parent | prev | next [-]

Wait, the only way they suggest solving the problem by rate limiting and using a better model?

Software engineers figured out these things decades ago. As a field, we already know how to do security. It's just difficult and incompatible with the careless mindset of AI products.

▲

crazygringo 5 hours ago | parent | next [-]

> As a field, we already know how to do security.

Well, AI is part of the field now, so... no, we don't anymore.

There's nothing "careless" about AI. The fact that there's no foolproof way to distinguish instruction tokens from data tokens is not careless, it's a fundamental epistemological constraint that human communication suffers from as well.

Saying that "software engineers figured out these things decades ago" is deep hubris based on false assumptions.

▲

NitpickLawyer 4 hours ago | parent | prev | next [-]

> As a field, we already know how to do security

Uhhh, no, we actually don't. Not when it comes to people anyway. The industry spends countless millions on trainings that more and more seem useless.

We've even had extremely competent and highly trained people fall for basic phishing (some in the recent few weeks). There was even a highly credentialed security researcher that fell for one on youtube.

	▲	simonw 3 hours ago \| parent [-]
		I like using Troy Hunt as an example of how even the most security conscious among us can fall for a phishing attack if we are having a bad day (he blamed jet flag fatigue): https://www.troyhunt.com/a-sneaky-phish-just-grabbed-my-mail...

▲

rvz 5 hours ago | parent | prev [-]

> Software engineers figured out these things decades ago.

Well this is what happens when a new industry attempts to reinvent poor standards and ignores security best practices just to rush out "AI products" for the sake of it.

We have already seen how (flawed) standards like MCPs were hacked immediately from the start and the approaches developers took to "secure" them with somewhat "better prompting" which is just laughable. The worst part of all of this was almost everyone in the AI industry not questioning the security ramifications behind MCP servers having direct access to databases which is a disaster waiting to happen.

Just because you can doesn't mean you should and we are seeing how hundreds of AI products are getting breached because of this carelessness in security, even before I mentioned if the product was "vibe coded" or not.

▲ mellosouls 6 hours ago | parent | prev | next [-]

Original @simonw article here:

https://simonw.substack.com/p/the-lethal-trifecta-for-ai-age...

https://simonwillison.net/2025/Aug/9/bay-area-ai/

Discussed:

https://news.ycombinator.com/item?id=44846922

▲ fn-mote 6 hours ago | parent | prev | next [-]

The trifecta:

> LLM access to untrusted data, the ability to read valuable secrets and the ability to communicate with the outside world

The suggestion is to reduce risk by setting boundaries.

Seems like security 101.

▲

danenania 6 hours ago | parent | next [-]

It is, but there's a direct tension here between security and capabilities. It's hard to do useful things with private data without opening up prompt injection holes. And there's a huge demand for this kind of product.

Agents also typically work better when you combine all the relevant context as much as possible rather than splitting out and isolating context. See: https://cognition.ai/blog/dont-build-multi-agents — but this is at odds with isolating agents that read untrusted input.

	▲	kccqzy 5 hours ago \| parent [-]
		The external communication part of the trifecta is an easy defense. Don't allow external communication. Any external information that's helpful for the AI agent should be available offline, be present in its model (possibly fine tuned).

▲

rvz 5 hours ago | parent | prev [-]

It is security 101 as this is just setting basic access controls at the very least.

The moment it has access to the internet, the risk is vastly increased.

But with a very clever security researcher, it is possible to take over the entire machine with a single prompt injection attack reducing at least one of the requirements.

▲ SAI_Peregrinus 5 hours ago | parent | prev | next [-]

LLMs don't make a distinction between prompt & data. There's no equivalent to an "NX bit", and AFAIK nobody has figured out how to create such an equivalent. And of course even that wouldn't stop all security issues, just as the NX bit being added to CPUs didn't stop all remote code execution attacks. So the best options we have right now tend to be based around using existing security mechanisms on the LLM agent process. If it runs as a special user then the regular filesystem permissions can restrict its access to various files, and various other mechanisms can be used to restrict access to other resources (outgoing network connections, various hardware, cgroups, etc.). But as long as untrusted data can contain instructions it'll be possible for the LLM output to contain secret data, and if the human using the LLM doesn't notice & copies that output somewhere public the exfiltration step returns.

	▲	boothby 3 hours ago \| parent [-]
		> AFAIK nobody has figured out how to create such an equivalent. I'm curious if anybody has even attempted it; if there's even training data for this. Compartmentalization is a natural aspect of cognition in social creatures. I've even known dogs to not to demonstrate knowledge of a food supply until they think they're not being observed. As a working professional with children, I need to compartmentalize: my social life, sensitive IP knowledge, my kid's private information, knowledge my kid isn't developmentally ready for, my internal thoughts, information I've gained from disreputable sources, and more. Intelligence may be important, but this is wisdom -- something that doesn't seem to be a first-class consideration if dogs and toddlers are in the lead.

▲ crazygringo 4 hours ago | parent | prev | next [-]

There's an interesting quote from the associated longer article [1]:

> In March, researchers at Google proposed a system called CaMeL that uses two separate LLMs to get round some aspects of the lethal trifecta. One has access to untrusted data; the other has access to everything else. The trusted model turns verbal commands from a user into lines of code, with strict limits imposed on them. The untrusted model is restricted to filling in the blanks in the resulting order. This arrangement provides security guarantees, but at the cost of constraining the sorts of tasks the LLMs can perform.

This is the first I've heard of it, and seems clever. I'm curious how effective it is. Does it actually provide absolute security guarantees? What sorts of constraints does it have? I'm wondering if this is a real path forward or not.

[1] https://www.economist.com/science-and-technology/2025/09/22/...

▲

simonw 4 hours ago | parent [-]

I wrote at length about the CaMeL paper here - I think it's a solid approach but it's also very difficult to implement and greatly restricts what the resulting systems can do: https://simonwillison.net/2025/Apr/11/camel/

	▲	crazygringo 2 hours ago \| parent [-]
		Thank you! That is very helpful. I'm very surprised I haven't come across it on HN before. Seems like CaMeL ought to be a front-page story here... seems like the paper got 16 comments 5 months ago, which isn't much: https://news.ycombinator.com/item?id=43733683

▲ 1vuio0pswjnm7 7 hours ago | parent | prev | next [-]

"And that means AI engineers need to start thinking like engineers, who build things like bridges and therefore know that shoddy work costs lives."

"AI engineers, inculcated in this way of thinking from their schooldays, therefore often act as if problems can be solved just with more training data and more astute system prompts."

▲

roughly 6 hours ago | parent | next [-]

> AI engineers need to start thinking like engineers

By which they mean actual engineers, not software engineers, who should also probably start thinking like real engineers now that our code’s going into both the bridges and the cars driving over them.

▲

kevin_thibedeau 6 hours ago | parent | next [-]

Engineering uses repeatable processes to produce expected results. Margin is added to quantifiable elements of a system to reduce the likelihood of failures. You can't add margin on a black box generated by throwing spaghetti at the wall.

	▲	chasd00 an hour ago \| parent \| next [-]
		> Engineering uses repeatable processes to produce expected results this is the thing with LLMs, the response to a prompt is not guaranteed to be repeatable. Why would you use something like that in an automation where repeatability is required? That's the whole point of automation, repeatability. Would you use a while loop that you can expect to iterate the specified number of times _almost_ every time?
	▲	recursive 4 hours ago \| parent \| prev \| next [-]
		You can. We know the properties of materials based on experimentation. In the same way, we can statistically quantify the results that come out of any kind of spaghetti box, based on repeated trials. Just like it's done in many other fields. Science is based on repeated testing of hypotheses. You rarely get black and white answers, just results that suggest things. Like the tensile strength of some particular steel alloy or something.
	▲	xboxnolifes 3 hours ago \| parent \| prev [-]
		Practically everything engineers have to interact with and consider are equivalent to a software black box. Rainfall, winds, tectonic shifts, material properties, etc. Humans don't have the source code to these things. We observe them, we quantify them, notice trends, model the observations, and we apply statistical analysis on them. And it's possible that a real engineer might do all this with an AI model and then determine it's not adequate and choose to not use it.

▲

bubblyworld 6 hours ago | parent | prev [-]

What are the kinds of things real engineers do that we could learn from? I hear this a lot ("programmers aren't real engineers") and I'm sympathetic, honestly, but I don't know where to start improving in that regard.

▲

roughly 6 hours ago | parent | next [-]

This is off the cuff, but comparing software & software systems to things like buildings, bridges, or real-world infrastructure, there's three broad gaps, I think:

1) We don't have a good sense of the "materials" we're working with - when you're putting up a building, you know the tensile strength of the materials you're working with, how many girders you need to support this much weight/stress, etc. We don't have the same for our systems - every large scale system is effectively designed clean-sheet. We may have prior experience and intuition, but we don't have models, and we can't "prove" our designs ahead of time.

2. Following on the above, we don't have professional standards or certifications. Anyone can call themselves a software engineer, and we don't have a good way of actually testing for competence or knowledge. We don't really do things like apprenticeships or any kind of formalized process of ensuring someone has the set of professional skills required to do something like write the software that's going to be controlling 3 tons of metal moving at 80MPH.

3. We rely too heavily on the ability to patch after the fact - when a bridge or a building requires an update after construction is complete, it's considered a severe fuckup. When a piece of software does, that's normal. By and large, this has historically been fine, because a website going down isn't a huge issue, but when we're talking about things like avionics suites - or even things like Facebook, which is the primary media channel for a large segment of the population - there's real world effects to all the bugs we're fixing in 2.0.

Again, by and large most of this has mostly been fine, because the stakes were pretty low, but software's leaked into the real world now, and our "move fast and break things" attitude isn't really compatible with physical objects.

▲

bostik 5 hours ago | parent | next [-]

There's a corollary to combination of 1 & 3. Software is by its nature extremely mutable. That in turn means that it gets repurposed and shoehorned into things that were never part of the original design.

You cannot build a bridge that could independently reassemble itself to an ocean liner or a cargo plane. And while civil engineering projects add significant margins for reliability and tolerance, there is no realistic way to re-engineer a physical construction to be able to suddenly sustain 100x its previously designed peak load.

In successful software systems, similar requirement changes are the norm.

I'd also like to point out that software and large-scale construction have one rather surprising thing in common: both require constant maintenance from the moment they are "ready". Or indeed, even earlier. To think that physical construction projects are somehow delivered complete is a romantic illusion.

	▲	Exoristos 3 hours ago \| parent [-]
		> You cannot build a bridge that could independently reassemble itself to an ocean liner or a cargo plane. Unless you are building with a toy system of some kind. There are safety and many other reasons civil engineers do not use some equivalent of Lego bricks. It may be time for software engineering also to grow up.

▲

taikahessu 5 hours ago | parent | prev | next [-]

> 3. We rely too heavily on the ability to patch after the fact...

I agree on all points and to build up on the last: making a 2.0 or a complete software rewrite is known to be even more hazardous. There are no quarantees the new version is better in any regards. Which makes the expertise to reflect more of other highly complex systems, like medical care.

Which is why we need to understand the patient, develop soft skills, empathy, Agile manifesto and ... the list could go on. Not an easy task when you include you are more likely going to also fight shiny object syndrome of yours execs and all the constant hype surrounding all tech.

▲

macintux 5 hours ago | parent | prev [-]

What concerns me the most is that a bridge, or road, or building has a limited number of environmental changes that can impact its stability. Software feels like it has an infinite number of dependencies (explicit and implicit) that are constantly changing: toolchains, libraries, operating systems, network availability, external services.

	▲	1313ed01 5 hours ago \| parent [-]
		That is also something the industry urgently needs to fix to be able to make safe things.

▲

skydhash 6 hours ago | parent | prev | next [-]

Act like creating a merge-request to main can expose you to bankruptcy or put you in jail. AKA investigate the impact of a diff to all the failure modes of a software.

▲

Mistletoe 6 hours ago | parent | prev [-]

What is the factor of safety on your code?

https://en.wikipedia.org/wiki/Factor_of_safety

▲

dpflan 7 hours ago | parent | prev | next [-]

Sounds like suggesting some sort of software engineering board certification plus and ethics certification — the “Von Neumann Oath”? Unethical while still legal software is just extremely lucrative, it seems hard to have this idea take flight.

▲

DaiPlusPlus 7 hours ago | parent | prev | next [-]

> can be solved just with more training data

Well, y'see - those deaths of innocent people *are* the training data.

▲

1vuio0pswjnm7 6 hours ago | parent | prev [-]

In addition to software "engineers", don't forget about software "architects"

▲ jngiam1 4 hours ago | parent | prev | next [-]

I have been thinking that the appropriate solution here is to detect when one of the legs is appearing to be a risk and then cutting it off if so.

You don’t want to have a blanket policy since that makes it no longer useful, but you want to know when something bad is happening.

▲ neallindsay 2 hours ago | parent | prev | next [-]

In-band signaling can never be secure. Doesn't anyone remember the Captain Crunch whistle?

▲ lowbloodsugar 6 hours ago | parent | prev [-]

Data breaches are hardly lethal. When we’re talking about AI there are plenty of actually lethal failure modes.

▲

simonw 6 hours ago | parent | next [-]

If the breached data is API keys that can be used to rack up charges, it's going to cost you a bunch of money.

If it's a crypto wallet then your crypto is irreversibly gone.

If the breached data is "material" - i.e. gives someone an advantage in stock market decisions - you're going to get in a lot of trouble with the SEC.

If the breached data is PII you're going to get in trouble with all kinds of government agencies.

If it's PII for children you're in a world of pain.

Update: I found one story about a company going bankrupt after a breach, which is the closest I can get to "lethal": https://www.securityweek.com/amca-files-bankruptcy-following...

Also it turns out Mossack Fonseca shut down after the Panama papers: https://www.theguardian.com/world/2018/mar/14/mossack-fonsec...

▲

datadrivenangel 6 hours ago | parent [-]

A PII for children data breach at a Fortune 1000 sized company can easily cost 10s of millions of dollars in employee time to fully resolve.

	▲	rvz 5 hours ago \| parent [-]
		...and a massive fine in the millions on top of that if you have customers that are from the EU.

▲

tedivm 2 hours ago | parent | prev | next [-]

There are people who have had to move after data breaches exposed their addresses to their stalkers. There's also people who may be gay but live in authoritarian places where this knowledge could kill them. It's pretty easy to see a path to lethality from a data breach.

▲

asadotzler 3 hours ago | parent | prev | next [-]

Jamal Khashoggi having his smartphone data exfiltrated was hardly lethal?

▲

HPsquared 6 hours ago | parent | prev | next [-]

Depends on the data.

▲

crazygringo 5 hours ago | parent | prev [-]

> Data breaches are hardly lethal.

They certainly can be when they come to classified military information around e.g. troop locations. There are lots more examples related to national security and terrorism that would be easy to think of.

> When we’re talking about AI there are plenty of actually lethal failure modes.

Are you trying to argue that because e.g. Tesla Autopilot crashes have killed people, we shouldn't even try to care about data breaches...?