| ▲ | yongjik 2 hours ago | |||||||
As I said, I believe statistical physics is a very good intuitional guidance. Molecules move randomly. That does not mean a cup of water will spontaneously boil itself. Sometimes the probability of something happening is so low that even if it's not mathematically zero it does not matter because you'll never observe it in the known universe. LLM generating each token probabilistically does not mean there's a realistic chance of generating any random stuff, where we can define "realistic" as "If we transform the whole known universe into data centers and run this model until the heat death of the universe, we will encounter it at least once." Of course that does not mean LLMs are infallible. It fails all the time! But you can't explain it as a fundamental shortcoming of a probabilistic structure: that's not a logical argument. Or, back to the original discussion, the fact that this one particular LLM generated a command to delete the database is not a fundamental shortcoming of LLM architecture. It's just a shortcoming of LLMs we currently have. | ||||||||
| ▲ | maxbond 2 hours ago | parent [-] | |||||||
I kinda feel like we're talking across purposes, so I'd like to understand what our disagreement actually is. In distributional language modeling, it is assumed that any series of tokens may appear and we are concerned with assigning probabilities to those sequences. We don't create explicit grammars that declare some sequences valid and others invalid. Do you disagree with that? Why? No matter how much prompting you give the agent, it does not eliminate the possibility that it will produce a dangerous output. It is always possible for the agent to produce a dangerous output. Do you disagree with that? Why? The only defensible position is to assume that there is no output your agent cannot produce, and so to assume it will produce dangerous outputs and act accordingly. Do you disagree with that? Why? | ||||||||
| ||||||||