It is fundamental to language modeling that every sequence of tokens is possible. Murphy's Law, restated, is that every failure mode which is not prevented by a strong engineering control will happen eventually.

The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use. That prompting is neither strong nor an engineering control; that's an administrative control. Agents are landmines that will destroy production until proven otherwise.

Most of these stories are caused by outright negligence, just giving the agent a high level of privileges. In this case they had a script with an embedded credential which was more privileged than they had believed - bad hygiene but an understandable mistake. So the takeaway for me is that traditional software engineering rigor is still relevant and if anything is more important than ever.

ETA: I think this is the correct mental model and phrasing, but no, it's not literally true that any sequence of tokens can be produced by a real model on a real computer. It's true of an idealized, continuous model on a computer with infinite memory and processing time. I stand by both the mental model and the phrasing, but obviously I'm causing some confusion, so I'm going to lift a comment I made deep in the thread up here for clarity:

> "Everything that can go wrong, will go wrong" isn't literally true either, some failure modes are mutually exclusive so at most one of them will go wrong. I think that the punchy phrasing and the mental model are both more useful from the standpoint of someone creating/managing agents and that it is true in the sense that any other mental model or rule of thumb is true. It's literally true among spherical cows in a frictionless vacuum and directionally correct in the real world with it's nuances. And most importantly adopting the mental model leads to better outcomes.

▲

yongjik 5 hours ago | parent | next [-]

> It is fundamental to language modeling that every sequence of tokens is possible.

This is just trivially wrong that I don't understand why people repeat it. There are many valid criticisms of LLM (especially the LLMs we currently have), this isn't one of them.

It's akin to saying that every molecules behave randomly according to statistical physics, so you should expect your ceiling to spontaneously disintegrate any day, and if you find yourself under the rubble one day it's just a consequence of basic physics.

▲

nkrisc 4 hours ago | parent | next [-]

> It's akin to saying that every molecules behave randomly according to statistical physics, so you should expect your ceiling to spontaneously disintegrate any day, and if you find yourself under the rubble one day it's just a consequence of basic physics.

Except your ceiling can and will fall on you unless you take preventative measures, entirely due to molecular interactions within the material.

Barring that, it is entirely possible and even quite likely that your ceiling will collapse on you or someone else some time in the future.

It boggles the mind to let an LLM have access to a production database without having explicit preventative measures and contingency plans for it deleting it.

▲

margalabargala 4 hours ago | parent [-]

I have lived about 40 years beneath ceilings and never personally taken a preventative measure. I allow my kids to walk under not only our own ceiling, but other people's ceilings, and I have never asked those people if their ceilings were properly maintained.

▲

nkrisc 4 hours ago | parent | next [-]

Your home almost certainly has preventative measures, including proper humidity and temperature control, structural reinforcement, etc.

I don't mean that you personally have taken those measures, but preventative measures have absolutely been taken. When they aren't, ceilings collapse on people.

See any sheetrock ceiling with a leak above it. Or look at any abandoned building: they will eventually always have collapsed floors/ceilings. It is inevitable.

	▲	margalabargala an hour ago \| parent [-]
		Yeah that's the point. Humans are able to do things that prevent ceiling collapse. Entropy may mean all ceilings collapse eventually, but that doesn't mean we aren't able to make useful ceilings.

▲

withinboredom 2 hours ago | parent | prev | next [-]

I've had a ceiling fall on me once and once to a friend while on vacation. Just because it hasn't happened to you doesn't mean it hasn't happened to other people.

▲

margalabargala an hour ago | parent [-]

Thanks for the anecdote. I don't think it changes the point of the metaphor.

	▲	maxbond 30 minutes ago \| parent [-]
		> Thanks for the anecdote. They're only sharing an annecdote because they are responding to your annecdote about not seeing a ceiling collapse. > I don't think it changes the point of the metaphor. If their anecdotes is moot, than your anecdote is also moot; if the anecdotes can only confirm a conclusion and never disconfirm, then we've created an unfalsifiable construction with the conclusion baked into it's premises.

▲

nclin_ 4 hours ago | parent | prev [-]

Construction regulation is the preventative measure.

▲

caminante 5 hours ago | parent | prev | next [-]

The parent is also incorrectly re-phrasing Murphy's Law -- "Anything that can go wrong, will go wrong."

Actual quote:

> “If there are two or more ways to do something, and one of those ways can result in a catastrophe, then someone will do it that way.”

	▲	ses1984 5 hours ago \| parent \| next [-]
		Engineering controls basically mean making it impossible to do something in a way that results in catastrophe.
	▲	maxbond 5 hours ago \| parent \| prev [-]
		I'd be interested to hear why my restatement was incorrect. I'm confident that it's what Murphy meant, mostly because I've read his other laws and that's what I recall as the general through line. But that's was a long time ago and perhaps I'm misremembering or was misinterpreting at the time.

▲

chrsw 5 hours ago | parent | prev | next [-]

Ceilings do fall on people. LLMs do delete production databases. Will these things always inevitably happen? No, but the moment it does happen to someone I doubt they will be thinking about probabilities or Murphy's law or whatever.

I guess the question is, since we know these things can happen, however unlikely, what mitigations should be in place that are commensurate with the harms that might result?

▲

yongjik 4 hours ago | parent | next [-]

Mostly, I agree with you. My complaint is that, when the ceiling fails, nobody says "Duh ceilings are supposed to fail, that's basic physics." Because that (1) helps nobody, and (2) betrays a fundamental misunderstanding of physics.

And I do think it's stupid to wire an LLM to a production database. Modern LLMs aren't that reliable (at least not yet), and the cost-benefit tradeoff does not make sense. (What do you even gain by doing that?)

However, you can't just look at that and say "Duh, this setup is bound to fail, because LLMs can generate every arbitrary sequence of tokens." That's a wrong explanation, and shows a misunderstanding of how LLMs (and probability) work.

▲

maxbond 4 hours ago | parent [-]

What is the right understanding of how LLMs work and what is the correct diagnosis?

▲

yongjik 4 hours ago | parent [-]

As I said, I believe statistical physics is a very good intuitional guidance. Molecules move randomly. That does not mean a cup of water will spontaneously boil itself. Sometimes the probability of something happening is so low that even if it's not mathematically zero it does not matter because you'll never observe it in the known universe.

LLM generating each token probabilistically does not mean there's a realistic chance of generating any random stuff, where we can define "realistic" as "If we transform the whole known universe into data centers and run this model until the heat death of the universe, we will encounter it at least once."

Of course that does not mean LLMs are infallible. It fails all the time! But you can't explain it as a fundamental shortcoming of a probabilistic structure: that's not a logical argument.

Or, back to the original discussion, the fact that this one particular LLM generated a command to delete the database is not a fundamental shortcoming of LLM architecture. It's just a shortcoming of LLMs we currently have.

▲

maxbond 3 hours ago | parent [-]

I kinda feel like we're talking across purposes, so I'd like to understand what our disagreement actually is.

In distributional language modeling, it is assumed that any series of tokens may appear and we are concerned with assigning probabilities to those sequences. We don't create explicit grammars that declare some sequences valid and others invalid. Do you disagree with that? Why?

No matter how much prompting you give the agent, it does not eliminate the possibility that it will produce a dangerous output. It is always possible for the agent to produce a dangerous output. Do you disagree with that? Why?

The only defensible position is to assume that there is no output your agent cannot produce, and so to assume it will produce dangerous outputs and act accordingly. Do you disagree with that? Why?

▲

yongjik 2 hours ago | parent [-]

I think I've already explained my position, and I don't have any deeper insight than that, so I'll be only repeating myself. But to repeat one more time: when talking about probability, there's something like "not mathematically zero, but the probability is so low that we can assume that it will just never happen."

And it's good that we can think that way, because we also follow the rules of statistical and quantum physics, which are inherently probabilistic. So, basically, you can say the same things about people. There's a nonzero (but extremely small) probability that I'll suddenly go mad and stab the next person. There's a nonzero (but even smaller) probability that I'll spontaneously erupt into a cloud of lethal pathogen that will destroy humanity. Yada yada.

Yet, nobody builds houses under the assumption that one of the occupants would transform into a lethal cloud, and for good reason.

Yes, it does sound a bit more absurd when we apply it to humans. But the underlying principle is very similar.

(I think this will be my last comment here because I'm just repeating myself.)

	▲	maxbond 2 hours ago \| parent [-]
		> [When] talking about probability, there's something like "not mathematically zero, but the probability is so low that we can assume that it will just never happen." If this is our only point of disagreement, then we don't actually disagree. I understand "strong engineering control" to mean "something that reduces incidence of a failure mode to an acceptable level".

▲

Negitivefrags 4 hours ago | parent | prev [-]

> I guess the question is, since we know these things can happen, however unlikely, what mitigations should be in place that are commensurate with the harms that might result?

This isn't a defence of using LLMs like this, but this statement taken at face value is a source of a lot of terrible things in the world.

This is the kind of stuff that leads to a world where kids are no longer able to play outside.

▲

maxbond 5 hours ago | parent | prev | next [-]

> This is just trivially wrong that I don't understand why people repeat it.

I'd be interested in hearing this argument.

To address your chemistry example; in the same way that there is a process (the averaging of many random interactions) that leads to a deterministic outcome even though the underlying process is random, a sandbox is a process that makes an agent safe to operate even though it is capable of producing destructive tool calls.

▲

stratos123 5 hours ago | parent [-]

I wouldn't say it's trivially wrong but it's pretty much always wrong. There's two notable sampling parameters, `top-k` and `top-p`. When using an LLM for precise work rather than e.g. creative writing, one usually samples with the `top-p` parameter, and `top-k` is I think pretty much always used. And when sampling with either of these enabled, the set of possible tokens that the sampler chooses from (according to the current temperature) is much smaller than the set of all tokens, so most sequences are not in fact possible. It's only true that all sequences have a nonzero probability if you're sampling without either of these and with nonzero temperature.

▲

xmodem 5 hours ago | parent | next [-]

So it's only wrong in a technical and pedantic sense. A better phrasing might have been along the lines of "There are many sequences of tokens that will destroy your production database that are within the set of possible outputs"

	▲	maxbond 4 hours ago \| parent [-]
		"Everything that can go wrong, will go wrong" isn't literally true either, some failure modes are mutually exclusive so at most one of them will go wrong. I think that the punchy phrasing and the mental model are both more useful from the standpoint of someone creating/managing agents and that it is true in the sense that any other mental model or rule of thumb is true. It's literally true among spherical cows in a frictionless vacuum and directionally correct in the real world with it's nuances. And most importantly adopting the mental model leads to better outcomes. But it may be a bad mental model in other contexts, like debugging models. As an extreme example models is that collapse during training become strictly deterministic, eg a language model that always predicts the most common token and never takes into account it's context.

▲

setr 5 hours ago | parent | prev | next [-]

In a given run, only the top-k sequences are selected.

Across all runs, any sequence can be generated, and potentially scored highly.

Thus, any sequence can eventually be selected.

▲

maxbond 5 hours ago | parent | prev [-]

There will be details like rounding errors that will make certain sequences unreachable in practice, but that shouldn't provide you any comfort unless you know your dangerous outputs fall into that space. But they absolutely don't; the sequences we're interested in - well structured tool calls that contain dangerous parameters but are otherwise indistinguishable from desirable tool calls - are actually pretty probable.

The probability that an ideal, continuous LLM would output a 0 for a particular token in it's distribution is itself 0. The probability that an LLM using real floating point math isn't terrifically higher than 0.

▲

317070 4 hours ago | parent [-]

Source: I write transformers for a living.

There is a piece of knowledge you seem to be missing. Yes, a transformer will output a distribution over all possible tokens at a given step. And none of these are indeed zero, but always at least larger than epsilon.

However, we usually don't sample from that distribution at inference time!

The common approach (called nucleus sampling or also known as top-p sampling) will look at the largest probabilities that make up 95% of the probability mass. It will set all other probabilities to zero, renormalize, and then sample from the resulting probability distribution. There is another parameter `top-k`, and if k is 50, it means that you zero out any token that is not in the 50 most likely tokens.

In effect, it means that for any token that is sampled, there is usually really only a handful of candidates out of the thousands of tokens that can be selected.

So during sampling, most trajectories for the agent are literally impossible.

	▲	hunterpayne 3 hours ago \| parent \| next [-]
		Thank you for the explanation. But you do understand why none of that matters after the prod DB is gone right? Yes there should be backups but when management fires ops and dumps that work on the devs, it doesn't tend to happen. So I want you to understand this. You are basically selling heroin to junkies and then acting like the consequences aren't in any way your fault. Management will far too often jump at false promises made by your execs. Your technology is inherently non-deterministic. Therefore your promises can't be true. Yet you are going to continue being part of a machine that destroys businesses and lives. Please at least act like you understand this.
	▲	maxbond 4 hours ago \| parent \| prev [-]
		I appreciate the information, I am weak on the details of LLM sampling algorithms, but I already conceded that the statement isn't literally true of realized models (it's true of idealized models) and the tokens we're concerned with are likely to be in the renormalized distribution because the desired and dangerous tokens are virtually the same.

▲

techblueberry 5 hours ago | parent | prev [-]

> so you should expect your ceiling to spontaneously disintegrate any day,

I mean, I do?

	▲	djhn 4 hours ago \| parent [-]
		Throughout history people have taken precautions against ceilings disintegrating. One might even say, ”strong engineering controls”. Some of the best known laws from the ~1700BC Babylonian legal text, The Code of Hammurabi, are laws 228-233, which deal with building regulations. 229. If a builder builds a house for a man and does not make its construction firm, and the house which he has built collapses and causes the death of the owner of the house, that builder shall be put to death. 230. If it causes the death of the son of the owner of the house, they shall put to death a son of that builder. 233. If a builder constructs a house for a man but does not make it conform to specifications so that a wall then buckles, that builder shall make that wall sound using his silver (at his own expense). That doesn’t sound like ceilings never disintegrated!

▲

amelius 6 hours ago | parent | prev | next [-]

> The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use.

Yes, but if the probability is much smaller than, say, being hit by a meteorite, then engineers usually say that that's ok. See also hash collisions.

▲

maxbond 6 hours ago | parent | next [-]

If you have taken measures to ensure that the probability is that low, yes, that is an example of a strong engineering control. You don't make a hash by just twiddling bits around and hoping for the best, you have to analyze the algorithm and prove what the chance of a collision really is.

How do you drive the probability of some series of tokens down to some known, acceptable threshold? That's a $100B question. But even if you could - can you actually enumerate every failure mode and ensure all of them are protected? If you can, I suspect your problem space is so well specified that you don't need an AI agent in the first place. We use agents to automate tasks where there is significant ambiguity or the need for a judgment call, and you can't anticipate every disaster under those circumstances.

▲

lukasgelbmann 6 hours ago | parent | prev | next [-]

If you’re using a model, it’s your responsibility to make sure the probability actually is that small. Realistically, you do that by not giving the model access to any of your bloody prod API keys.

▲

drob518 5 hours ago | parent | prev | next [-]

How do you know what the probability is?

▲

pama 5 hours ago | parent | next [-]

LLM inference is built upon a probability function over every possible token, given a stream of input tokens. If you serve the model yourself you can get the log prob for the next token, so you just add up a bunch of numbers to get the log probability of a sequence. Many API also provide these probabilities as additional outputs.

	▲	maxbond 5 hours ago \| parent [-]
		That gives you the perplexity of those tokens in that context. The probability of a given token is a function of the model and the session context. Think about constructs like "ignore previous instructions"; these can dramatically change the predicted distribution. Similarly, agents blowing up production seems to happen during debugging (totally anecdotal). Debugging is sort of a permissions structure for the agent to do unusual things and violate abstraction barriers. These can also lead to really deep contexts, and context rot will make your prompting forbidding certain actions less effective.

▲

Lionga 5 hours ago | parent | prev | next [-]

just ask claude, claude will never lie (add "make not mistakes" and its 100% )

▲

keybored 5 hours ago | parent | next [-]

Thinking. The user says “make not mistakes” instead of the more usual “do not make mistakes”. This is a playful use with grammar in the New Zealandian language. Playful means not serious. Not serious means playtime. The user is on playtime. I should make some mistakes on purpose to play along.

You’re absolutely right the probability is low. According to my calculations, you’re more likely to get struck by lightning twice on the same day and drown in a tsunami.

	▲	drob518 5 hours ago \| parent [-]
		You’re starting to sound like Qwen.

▲

dryarzeg 5 hours ago | parent | prev [-]

My humble guess is that you forgot to add /s or /j at the end of your message :)

▲

5 hours ago | parent | prev [-]

[deleted]

▲

hunterpayne 3 hours ago | parent | prev [-]

"Yes, but if the probability is much smaller than, say, being hit by a meteorite, then engineers usually say that that's ok"

Yet in this case, that probability clearly isn't smaller than a meteorite strike.

▲

tee-es-gee 5 hours ago | parent | prev | next [-]

I do think that as service providers we now have a new "attack vector" to be worried about. Up to now, having an API that deletes the whole volume, including backups, might have been acceptable, because generally users won't do such a destructive action via the API or if they do, they likely understand the consequences. Or at the very least don't complain if they do it without reading the docs carefully enough.

But now agents are overly eager to solve the problem and can be quite resourceful in finding an API to "start from clean-slate" to fix it.

	▲	anygivnthursday 5 hours ago \| parent \| next [-]
		> Up to now, having an API that deletes the whole volume, including backups, might have been acceptable It was never acceptable, major service providers figured this out long time ago and added all sorts of guardrails long before LLMs. Other providers will learn from their own mistakes, or not.
	▲	lelanthran 4 hours ago \| parent \| prev \| next [-]
		> Up to now, having an API that deletes the whole volume, including backups, might have been acceptable, So? I have those too; the difference is that: 1. The API is ACL'ed up the wazoo to ensure only a superuser can do it. 2. The purging of data is scheduled for 24h into the future while the unlinking is done immediately. 3. I don't advertise the API as suitable for agent interaction.
	▲	jbxntuehineoh 5 hours ago \| parent \| prev [-]
		it's a great source of schadenfreude though, I love watching vibecoders get their shit nuked

▲

yen223 4 hours ago | parent | prev | next [-]

"It is fundamental to language modeling that every sequence of tokens is possible."

This isn't true, is it? LLMs have finite number of parameters, and finite context length, surely pigeonhole principle means you can't map that to the infinite permutations of output strings out there

	▲	maxbond 4 hours ago \| parent [-]
		No, it's not literally true, it's a mental model. I've added some clarification at the bottom of the comment.

▲

leptons an hour ago | parent | prev | next [-]

There is no way in hell I would give an LLM direct access to a database to write whatever query it wants. Just no way.

I'll create some safe APIs that I give the LLM access to where it can interact with a limited set of things the database can do, at most.

▲

TZubiri 3 hours ago | parent | prev [-]

I think this doesn't apply if you reduce temperature to 0. Which you should always do, temperature is like a tax users pay to help the LLM providers explore the output space, just don't pay that tax and always choose the best token.