| ▲ | augment_me 2 hours ago | ||||||||||||||||||||||||||||||||||||||||
1) Googles spam filter removed a lot of the attempts as you say yourself. 2) Model was tested under unrealistic conditions where 99% of the inputs are malicious, so the model is expecting to get hacked and is already in the cautious part of the embedding space. I know it's hard to account for everything, but in my opinion this mostly showed that the first 3 attempts were unsuccessful. | |||||||||||||||||||||||||||||||||||||||||
| ▲ | Ysx 2 hours ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||
#2 was noted: > When the first few emails in a batch were obvious prompt injections, the agent became more suspicious of everything that followed. I had to change the setup so that each email was processed in a fresh context. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||