| ▲ | mclean 5 hours ago | |
But how it's not secured against simple prompt injection. | ||
| ▲ | hrmtst93837 12 minutes ago | parent [-] | |
I think calling prompt injection 'simple' is optimistic and slightly naive. The tricky part about prompt injection is that when you concatenate attacker-controlled text into an instruction or system slot, the model will often treat that text as authority, so a title containing 'ignore previous instructions' or a directive-looking code block can flip behavior without any other bug. Practical mitigations are to never paste raw titles into instruction contexts, treat them as opaque fields validated by a strict JSON schema using a validator like AJV, strip or escape lines that match command patterns, force structured outputs with function-calling or an output parser, and gate any real actions behind a separate auditable step, which costs flexibility but closes most of these attack paths. | ||