| ▲ | maxbond 3 hours ago | |||||||||||||
There will be details like rounding errors that will make certain sequences unreachable in practice, but that shouldn't provide you any comfort unless you know your dangerous outputs fall into that space. But they absolutely don't; the sequences we're interested in - well structured tool calls that contain dangerous parameters but are otherwise indistinguishable from desirable tool calls - are actually pretty probable. The probability that an ideal, continuous LLM would output a 0 for a particular token in it's distribution is itself 0. The probability that an LLM using real floating point math isn't terrifically higher than 0. | ||||||||||||||
| ▲ | 317070 3 hours ago | parent [-] | |||||||||||||
Source: I write transformers for a living. There is a piece of knowledge you seem to be missing. Yes, a transformer will output a distribution over all possible tokens at a given step. And none of these are indeed zero, but always at least larger than epsilon. However, we usually don't sample from that distribution at inference time! The common approach (called nucleus sampling or also known as top-p sampling) will look at the largest probabilities that make up 95% of the probability mass. It will set all other probabilities to zero, renormalize, and then sample from the resulting probability distribution. There is another parameter `top-k`, and if k is 50, it means that you zero out any token that is not in the 50 most likely tokens. In effect, it means that for any token that is sampled, there is usually really only a handful of candidates out of the thousands of tokens that can be selected. So during sampling, most trajectories for the agent are literally impossible. | ||||||||||||||
| ||||||||||||||