Remix.run Logo
wat10000 3 days ago

Think about how LLMs work. They’re trained to imitate the training data.

What’s in the training data involving threats of punishment? A lot of those threats are followed by compliance. The LLM will imitate that by following your threat with compliance.

Similarly you can offer payment to some effect. You won’t pay, and the LLM has no use for the money even if you did, but that doesn’t matter. The training data has people offering payment and other people doing as instructed afterwards.

Oddly enough, offering threats or rewards is the opposite of anthropomorphizing the LLM. If it was really human (or equivalent), it would know that your threats or rewards are completely toothless, and ignore them, or take them as a sign that you’re an untrustworthy liar.

georgefrowny 3 days ago | parent [-]

What actual training data does contain threats of punishment like this? It's not like most of the web has explicit threats of punishment followed immediately by compliance.

And only the shlockiest fan fiction would have "Do what I want or you'll be punished!" "Yes master, I obey without question".

wat10000 2 days ago | parent [-]

Internet forums contain numerous examples of rules followed by statements of what happens if you don’t follow them, followed by people obeying them.