▲ | Mars008 4 days ago | |
Something I don't understand. Wasn't attention with query/key supposed to filter out irrelevant tokens? 2. This CatsAttack has many applications. For example, it probably can confuse safety and spam filters. Can be tried on image generators... | ||
▲ | ethan_smith 3 days ago | parent [-] | |
Attention weights can still assign non-zero probability to irrelevant tokens since the mechanism optimizes for prediction rather than semantic relevance, and these irrelevant tokens can create interference in the hidden state representations. |