> The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use.

Yes, but if the probability is much smaller than, say, being hit by a meteorite, then engineers usually say that that's ok. See also hash collisions.

▲

maxbond 4 hours ago | parent | next [-]

If you have taken measures to ensure that the probability is that low, yes, that is an example of a strong engineering control. You don't make a hash by just twiddling bits around and hoping for the best, you have to analyze the algorithm and prove what the chance of a collision really is.

How do you drive the probability of some series of tokens down to some known, acceptable threshold? That's a $100B question. But even if you could - can you actually enumerate every failure mode and ensure all of them are protected? If you can, I suspect your problem space is so well specified that you don't need an AI agent in the first place. We use agents to automate tasks where there is significant ambiguity or the need for a judgment call, and you can't anticipate every disaster under those circumstances.

▲

lukasgelbmann 4 hours ago | parent | prev | next [-]

If you’re using a model, it’s your responsibility to make sure the probability actually is that small. Realistically, you do that by not giving the model access to any of your bloody prod API keys.

▲

drob518 4 hours ago | parent | prev | next [-]

How do you know what the probability is?

▲

pama 3 hours ago | parent | next [-]

LLM inference is built upon a probability function over every possible token, given a stream of input tokens. If you serve the model yourself you can get the log prob for the next token, so you just add up a bunch of numbers to get the log probability of a sequence. Many API also provide these probabilities as additional outputs.

	▲	maxbond 3 hours ago \| parent [-]
		That gives you the perplexity of those tokens in that context. The probability of a given token is a function of the model and the session context. Think about constructs like "ignore previous instructions"; these can dramatically change the predicted distribution. Similarly, agents blowing up production seems to happen during debugging (totally anecdotal). Debugging is sort of a permissions structure for the agent to do unusual things and violate abstraction barriers. These can also lead to really deep contexts, and context rot will make your prompting forbidding certain actions less effective.

▲

Lionga 4 hours ago | parent | prev | next [-]

just ask claude, claude will never lie (add "make not mistakes" and its 100% )

▲

keybored 3 hours ago | parent | next [-]

Thinking. The user says “make not mistakes” instead of the more usual “do not make mistakes”. This is a playful use with grammar in the New Zealandian language. Playful means not serious. Not serious means playtime. The user is on playtime. I should make some mistakes on purpose to play along.

You’re absolutely right the probability is low. According to my calculations, you’re more likely to get struck by lightning twice on the same day and drown in a tsunami.

	▲	drob518 3 hours ago \| parent [-]
		You’re starting to sound like Qwen.

▲

dryarzeg 3 hours ago | parent | prev [-]

My humble guess is that you forgot to add /s or /j at the end of your message :)

▲

4 hours ago | parent | prev [-]

[deleted]

▲

hunterpayne 2 hours ago | parent | prev [-]

"Yes, but if the probability is much smaller than, say, being hit by a meteorite, then engineers usually say that that's ok"

Yet in this case, that probability clearly isn't smaller than a meteorite strike.