Remix.run Logo
janalsncm 2 days ago

No system is 100% foolproof. If the baseline is “all malicious content gets through” and this method reduces it by 95% but that last 5% is using some sophisticated prompt injection, that’s not a “yikes” that’s a major win.

At a technical level the risk isn’t from the size of the model but the fact that it is open weight and anyone can use it to create an adversarial payload.

simonw 2 days ago | parent [-]

I disagree. In software security 95% is not a win - it's an invitation for users to trust a system that they shouldn't be trusting.