Remix.run Logo
Mars008 4 days ago

Something I don't understand. Wasn't attention with query/key supposed to filter out irrelevant tokens?

2. This CatsAttack has many applications. For example, it probably can confuse safety and spam filters. Can be tried on image generators...

ethan_smith 3 days ago | parent [-]

Attention weights can still assign non-zero probability to irrelevant tokens since the mechanism optimizes for prediction rather than semantic relevance, and these irrelevant tokens can create interference in the hidden state representations.