| ▲ | nialse 8 hours ago | |
It's not necessarily "ignoring" instructions, it's the ironic effect of mentioning something not to focus on, which produces focus on said thing. The classic version is: "For the next minute, try not to think about a pink elephant. You can think about anything else you like, just not a pink elephant." | ||
| ▲ | fennecbutt 8 hours ago | parent [-] | |
Yes exactly. But for llms it's more that it's not really "thinking" about what it's saying per se, it's that it's predicting next token. Sure, in a super fancy way but still predicting next token. Context poisoning is real | ||