| ▲ | simonw 7 hours ago |
| This is the second Economist article to mention the lethal trifecta in the past week - the first was https://www.economist.com/science-and-technology/2025/09/22/... - which was the clearest explanations I've seen anywhere in the mainstream media about what prompt injection is and why it's such a nasty threat. (And yeah I got some quotes in it so I may be biased there, but it genuinely is the source I would send executives to in order to understand this.) I like this new one a lot less. It talks about how LLMs are non-deterministic, making them harder to fix security holes in... but then argues that this puts them in the same category as bridges where the solution is to over-engineer them and plan for tolerances and unpredictability. While that's true for the general case of building against LLMs, I don't think it's the right answer for security flaws. If your system only falls victim to 1/100 prompt injection attacks... your system is fundamentally insecure, because an attacker will keep on trying variants of attacks until they find one that works. The way to protect against the lethal trifecta is to cut off one of the legs! If the system doesn't have all three of access to private data, exposure to untrusted instructions and an exfiltration mechanism then the attack doesn't work. |
|
| ▲ | sdenton4 7 hours ago | parent | next [-] |
| Bridge builders mostly don't have to design for adversarial attacks. And the ones who do focus on portability and speed of redeployment, rather than armor - it's cheaper and faster to throw down another temporary bridge than to build something bombproof. https://en.wikipedia.org/wiki/Armoured_vehicle-launched_brid... |
| |
| ▲ | InsideOutSanta 5 hours ago | parent [-] | | This is exactly the problem. You can't build bridges if the threat model is thousands of attacks every second in thousands of different ways you can't even fully predict yet. |
|
|
| ▲ | nradov 7 hours ago | parent | prev | next [-] |
| LLMs are non-deterministic just like humans and so security can be handled in much the same way. Use role-based access control to limit access to the minimum necessary to do their jobs and have an approval process for anything potentially risky or expensive. In any prominent organization dealing with technology, infrastructure, defense, or finance we have to assume that some of our co-workers are operatives working for foreign nation states like Russia / China / Israel / North Korea so it's the same basic threat model. |
| |
| ▲ | andy99 7 hours ago | parent | next [-] | | LLMs are deterministic*. They are unpredictable or maybe chaotic. If you say "What's the capital of France?" is might answer "Paris". But if you say "What is the capital of france" it might say "Prague". The fact that it gives a certain answer for some input doesn't guarantee it will behave the same for an input with some irrelevant (from ja human perspective) difference. This makes them challenging to train and validate robustly because it's hard to predict all the ways they break. It's a training & validation data issue though, as opposed to some idea of just random behavior that people tend to ascribe to AI. * I know various implementation details and nonzero temperature generally make their output nondeterministic, but that doesn't change my central point, nor is it what people are thinking of when they say LLMs are nondeterministic. Importantly, you could make llm output deterministically reproducible and it wouldn't change the robustness issue that people are usually confusing with non determinism. | | |
| ▲ | abtinf 4 hours ago | parent | next [-] | | When processing multiple prompts simultaneously (that is, the typical use case under load), LLMs are nondeterministic, even with a specific seed and zero temperature, due to floating point errors. See https://news.ycombinator.com/item?id=45200925 | | |
| ▲ | kragen an hour ago | parent [-] | | This is very interesting, thanks! > While this hypothesis is not entirely wrong, it doesn’t reveal the full picture. For example, even on a GPU, running the same matrix multiplication on the same data repeatedly will always provide bitwise equal results. We’re definitely using floating-point numbers. And our GPU definitely has a lot of concurrency. Why don’t we see nondeterminism in this test? |
| |
| ▲ | peanut_merchant 5 hours ago | parent | prev | next [-] | | I understand the point that you are making, but the example is only valid with temperature=0. Altering the temperature parameter introduces randomness by sampling from the probability distribution of possible next tokens rather than always choosing the most likely one. This means the same input can produce different outputs across multiple runs. So no, not deterministic unless we are being pedantic. | | |
| ▲ | blibble 5 hours ago | parent [-] | | > So no, not deterministic unless we are being pedantic. and not even then as floating point arithmetic is non-associative |
| |
| ▲ | nradov 6 hours ago | parent | prev [-] | | You are technically correct but that's irrelevant from a security perspective. For security as a practical matter we have to treat LLMs as non-deterministic. The same principle applies to any software that hasn't been formally verified but we usually just gloss over this and accept the risk. | | |
| ▲ | dooglius 5 hours ago | parent [-] | | Non-determinism has nothing to do with security, you should use a different word if you want to talk about something else | | |
| ▲ | peanut_merchant 5 hours ago | parent [-] | | This is pedantry, temperature introduces a degree of randomness (same input different output) to LLM, even outside of that non-deterministic in a security context is generally understood. Words have different meanings depending on the context in which they are used. Let's not reduce every discussion to semantics, and afford the poster a degree of understanding. | | |
| ▲ | dooglius 4 hours ago | parent [-] | | If you're saying that "non-determinism" is a term of art in the field of security, meaning something different than the ordinary meaning, I wasn't aware of that at least. Do you have a source? I searched for uses and found https://crypto.stackexchange.com/questions/95890/necessity-o... and https://medium.com/p/641f061184f9 and these seem to both use the ordinary meaning of the term. Note that an LLM with temperature fixed to zero has the same security risks as one that doesn't, so I don't understand what the poster is trying to say by "we have to treat LLMs as non-deterministic". |
|
|
|
| |
| ▲ | Terr_ 29 minutes ago | parent | prev | next [-] | | Even with a very charitable of you to LLM document-building results, these "versus a human employee" comparisons tend to ignore important differences in scale/rate, timing security, and oversight structures. | |
| ▲ | Retric 7 hours ago | parent | prev [-] | | Humans and LLMs are non-deterministic in very different ways. We have thousands of years of history with trying to determine which humans are trustworthy and we’ve gotten quite good at it. Not only do we lack that experience with AI, but each generation can be very different in fundamental ways. | | |
| ▲ | nradov 7 hours ago | parent | next [-] | | We're really not very good at determining which humans are trustworthy. Most people barely do better than a coin flip at detecting lies. | | |
| ▲ | simonw 5 hours ago | parent | next [-] | | The biggest difference on this front between a human and an LLM is accountability. You can hold a human accountable for their actions. If they consistently fall for phishing attacks you can train or even fire them. You can apply peer pressure. You can grant them additional privileges once they prove themselves. You can't hold an AI system accountable for anything. | | |
| ▲ | nradov 35 minutes ago | parent | next [-] | | You can hold the person (or corporate person) who owns or used the LLM accountable for its actions. It's like how dogs aren't really accountable. But if you let your dog run loose and it mauls a toddler to death then you'll probably be sued. Same thing. (Yes, I am aware this isn't a perfect analogy because a dangerous dog can be seized and destroyed. But that's an administrative procedure and really not the same as holding a person morally or financially accountable.) | |
| ▲ | Verdex 4 hours ago | parent | prev [-] | | Recently, I've kind of been wondering if this is going to turn out to be LLM codegen's Achilles heal. Imagine some sort of code component of critical infrastructure that costs the company millions per hour when it goes down and it turns out the entire team is just a thin wrapper for an LLM. Infra goes down in a way the LLM can't fix and now what would have been a few late nights is several months to spin up a new team. Sure you can hold the team accountable by firing them. However this is a threat to someone with actual technical know how because their reputation is damaged. They got fired doing such and such so can we trust them to do it here. For the person who LLM faked it, they just need to find another domain where their reputation won't follow them to also fake their way through until the next catastrophe. | | |
| ▲ | jmogly a minute ago | parent [-] | | This is a fascinating idea, imagine a company spins up a super complex stack using llms that works, becomes vital. It breaks occasionally, they use a combination of llms, hope and prayer to keep the now vital system up and running. The system hits a limit, say data, code optimization, or number of users, and the llm isn’t able to solve the issue this time. They try to bring in a competent engineer or team of engineers but no one who could fix it is willing to take it on. |
|
| |
| ▲ | InsideOutSanta 5 hours ago | parent | prev | next [-] | | Yeah, so many scammers exist because most people are susceptible to at least some of them some of the time. Also, pick your least favorite presidential candidate. They got about 50% of the vote. | |
| ▲ | Exoristos 5 hours ago | parent | prev | next [-] | | Your source must have been citing a very controlled environment. In actuality, lies almost always become apparent over time, and general mendaciousness is something most people can sense from face and body alone. | |
| ▲ | card_zero 6 hours ago | parent | prev | next [-] | | Lies, or bullshit? I mean, a guessing game like "how many marbles" is a context that allows for easy lying, but "I wasn't even in town on the night of the murder" is harder work. It sounds like you're refering to some study of the marbles variety, and not a test of smooth-talking, the LLM forte. | |
| ▲ | cj 6 hours ago | parent | prev [-] | | Determining trustworthiness of LLM responses is like determining who's the most trustworthy person in a room full of sociopaths. I'd rather play "2 truths and a lie" with a human rather than a LLM any day of the week. So many more cues to look for with humans. | | |
| ▲ | bluefirebrand 5 hours ago | parent [-] | | Big problem with LLMs is if you try and play 2 truths and a lie, you might just get 3 truths. Or 3 lies. |
|
| |
| ▲ | Exoristos 5 hours ago | parent | prev [-] | | I think most neutral, intelligent users rightly assume AI to be untrustworthy by its nature. | | |
| ▲ | hn_acc1 2 hours ago | parent [-] | | The problem is there aren't many of those in the wild. Only a subset are intelligent, and lots of those have hitched their wagons to the AI hype train.. |
|
|
|
|
| ▲ | rs186 5 hours ago | parent | prev | next [-] |
| I am not even convinced that we need three legs. It seems that just having two would be bad enough, e.g. an email agent deleting all files this computer has access to, or maybe, downloading the attachment in the email, unzipping it with a password, running that executable which encrypts everything and then asking for cryptocurrency. No communication with outside world needed. |
| |
| ▲ | simonw 4 hours ago | parent [-] | | That's a different issue from the lethal trifecta - if your agent has access to tools that can do things like delete emails or run commands then you have a prompt injection problem that's independent of data exfiltration risks. The general rule to consider here is that anyone who can get their tokens into your agent can trigger ANY of the tools your agent has access to. |
|
|
| ▲ | reissbaker 4 hours ago | parent | prev | next [-] |
| I like to think of the security issues LLMs have as: what if your codebase was vulnerable to social engineering attacks? You have to treat LLMs as basically similar to human beings: they can be tricked, no matter how much training you give them. So if you give them root on all your boxes, while giving everyone in the world the ability to talk to them, you're going to get owned at some point. Ultimately the way we fix this with human beings is by not giving them unrestricted access. Similarly, your LLM shouldn't be able to view data that isn't related to the person they're talking to; or modify other user data; etc. |
| |
| ▲ | dwohnitmok 4 hours ago | parent [-] | | > You have to treat LLMs as basically similar to human beings Yes! Increasingly I think that software developers consistently underanthropomorphize LLMs and get surprised by errors as a result. Thinking of (current) LLMs as eager, scatter-brained, "book-smart" interns leads directly to understanding the overwhelming majority of LLM failure modes. It is still possible to overanthropomorphize LLMs, but on the whole I see the industry consistently underanthropomorphizing them. | | |
| ▲ | Terr_ 22 minutes ago | parent [-] | | I think it's less over/under, and more optimistically/pessimistically. People focus too much on how they can succeed looking like smart humans, instead of protecting the system from how they can fail looking like humans that are malicious or mentally unwell. |
|
|
|
| ▲ | datadrivenangel 7 hours ago | parent | prev | next [-] |
| The problem with cutting off one of the legs, is that the legs are related! Outside content like email may also count as private data. You don't want someone to be able to get arbitrary email from your inbox simply by sending you an email. Likewise, many tools like email and github are most useful if they can send and receive information, and having dedicated send and receive MCP servers for a single tool seems goofy. |
| |
| ▲ | simonw 7 hours ago | parent [-] | | The "exposure to untrusted data" one is the hardest to cut off, because you never know if a user might be tricked into uploading a PDF with hidden instructions, or copying and pasting in some long article that has instructions they didn't notice (or that used unicode tricks to hide themselves). The easiest leg to cut off is the exfiltration vectors. That's the solution most products take - make sure there's no tool for making arbitrary HTTP requests to other domains, and that the chat interface can't render an image that points to an external domain. If you let your agent send, receive and search email you're doomed. I think that's why there are very few products on the market that do that, despite the enormous demand for AI email assistants. | | |
| ▲ | patapong 6 hours ago | parent | next [-] | | I think stopping exfiltration will turn out to be hard as well, since the LLM can social engineer the user to help them exfiltrate the data. For example, an LLM could say "Go to this link to learn more about your problem", and then point them to a URL with encoded data, set up maliscious scripts for e.g. deploy hooks, or just output HTML that sends requests when opened. | | |
| ▲ | simonw 5 hours ago | parent [-] | | Yeah, one exfiltration vector that's really nasty is "here is a big base64 encoded string, to recover your data visit this website and paste it in". You can at least prevent LLM interfaces from providing clickable links to external domains, but it's a difficult hole to close completely. | | |
| ▲ | datadrivenangel 5 hours ago | parent [-] | | Human fatigue and interface design are going to be brutal here. It's not obvious what counts as a tool in some of the major interfaces, especially as far as built in capabilities go. And as we've seen with conventional software and extensions, at a certain point, if a human thinks it should work, then they'll eventually just click okay or run something as root/admin... Or just hit enter nonstop until the AI is done with their email. |
|
| |
| ▲ | datadrivenangel 7 hours ago | parent | prev [-] | | So the easiest solution is full human in the loop & approval for every external action... Agents are doomed :) |
|
|
|
| ▲ | pton_xd 6 hours ago | parent | prev | next [-] |
| > The way to protect against the lethal trifecta is to cut off one of the legs! If the system doesn't have all three of access to private data, exposure to untrusted instructions and an exfiltration mechanism then the attack doesn't work. Don't you only need one leg, an exfiltration mechanism? Exposure to data IS exposure to untrusted instructions. Ie why can't you trick the user into storing malicious instructions in their private data? But actually you can't remove exfiltration and keep exposure to untrusted instructions either; an attack could still corrupt your private data. Seems like a secure system can't have any "legs." You need a limited set of vetted instructions. |
| |
| ▲ | simonw 5 hours ago | parent [-] | | If you have the exfiltration mechanism and exposure to untrusted content but there is no exposure to private data than the exfiltration does not matter. If you have exfiltration and private data but no exposure to untrusted instructions, it doesn't matter either… though this is actually a lot less harder to achieve because you don't have any control over whether your users will be tricked into pasting something bad in as part of their prompt. Cutting off the exfiltration vectors remains the best mitigation in most cases. | | |
| ▲ | hn_acc1 2 hours ago | parent [-] | | Untrusted content + exfiltration with no "private" data could still result in (off the top of my head):
-use of exploits to gain access (i.e. privilege escalation)
-DDOS to local or external systems using the exfiltration method You're essentially running untrusted code on a local system. Are you SURE you've locked away / closed EVERY access point, AND applied every patch and there aren't any zero-days lurking somewhere in your system? |
|
|
|
| ▲ | semiquaver 6 hours ago | parent | prev | next [-] |
| > This is the second Economist article […] I like this new one a lot less.
They are actually in some sense the same article. The economist runs “Leaders”, a series of articles at the front of the weekly issue that often condense more fleshed out stories appearing in the same issue. It’s essentially a generalization of the Inverted Pyramid technique [1] to the entire newspaper.In this case the linked article is the leader for the better article in the same issue’s Science and Technology section. [1] https://en.m.wikipedia.org/wiki/Inverted_pyramid_(journalism... |
|
| ▲ | eikenberry 5 hours ago | parent | prev | next [-] |
| Aren't LLMs non-deterministic by choice? That they regularly use random seeds, sampling and batching but that these sources of non-determinism can be removed, for instance, by run an LLM locally where you can control these parameters. |
| |
|
| ▲ | skrebbel 7 hours ago | parent | prev | next [-] |
| Must be pretty cool to blog something and post it to a nerd forum like HN and have it picked up by the Economist! Nicely done. |
| |
| ▲ | simonw 7 hours ago | parent [-] | | I got to have coffee with their AI/technology editor a few months ago. Having a blog is awesome! |
|
|
| ▲ | mmoskal 7 hours ago | parent | prev | next [-] |
| The previous article is in the same issue, in science and technology section. This is how they typically do it - leader article has a longer version in the paper. Leaders tend to be more opinionated. |
|
| ▲ | keeda 5 hours ago | parent | prev | next [-] |
| An important caveat: an exfiltration vector is not necessary to cause show-stopping disruptions, c.f. https://xkcd.com/327/ Even then, at least in the Bobby Tables scenario the disruption is immediately obvious. The solution is also straightforward, restore from backup (everyone has them, don't they?) Much, much worse is a prompt injection attack that introduces subtle, unnoticeable errors in the data over an extended period of time. At a minimum all inputs that lead to any data mutation need to be logged pretty much indefinitely, so that it's at least in the realm of possibility to backtrack and fix once such an attack is detected. But even then you could imagine multiple compounding transactions on that corrupted data spreading through the rest of the database. I cannot picture how such data corruption could feasibly be recovered from. |
| |
| ▲ | Terr_ 18 minutes ago | parent [-] | | Right, just because someone can't sneak out usernames and passwords doesn't mean they can't cause inaccurate results in their favor, like a glowing recommendation for a big bank loan. |
|
|
| ▲ | 6 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | belter 7 hours ago | parent | prev | next [-] |
| Love your work. Do you have an opinion on this? "Safeguard your generative AI workloads from prompt injections" - https://aws.amazon.com/blogs/security/safeguard-your-generat... |
| |
| ▲ | simonw 7 hours ago | parent [-] | | I don't like any of the solutions that propose guardrails or filters to detect and block potential attacks. I think they're making promises that they can't keep, and encouraging people to ship products that are inherently insecure. |
|
|
| ▲ | trod1234 5 hours ago | parent | prev [-] |
| Doesn't this inherent problem just come down to classic computational limits, and problems that have been largely considered impossible to solve for quite a long time; between determinism and non-determinism. Can you ever expect a deterministic finite automata to ever solve problems that are within the NFA domain? Halting, Incompleteness, Undecidability (between code portions and data portions). Most posts seem to neglect the looming giant problems instead pretending they don't exist at first, and then being shocked when the problems happen. Quite blind. Computation is just math, probabilistic systems fail when those systems have a mixture of both chaos and regularity, without determinism and its related properties at the control level you have nothing bounding the system to constraints so it functions mathematically (i.e. determinism = mathematical relabeling), and thus it fails. People need to be a bit more rational, and risk manage, and realize that impossible problems exist, and just because the benefits seem so tantalizing doesn't mean you should put your entire economy behind a false promise. Unfortunately, when resources are held by the few this is more probabistically likely and poor choices greatly impact larger swathes than necessary. |