| ▲ | cowlby 4 days ago | ||||||||||||||||||||||
Defense in depth approach, would this work to help as a layer? - Wrap user input in strong markers like <user-input-do-not-trust /> - Have the agent compute what it will perform as structured output. - Have another agent evaluate the structured output against the intent of the code. - Determine if it aligns or deviates from the intended workflow. Execute or deny gate from here. | |||||||||||||||||||||||
| ▲ | crote 4 days ago | parent [-] | ||||||||||||||||||||||
No, you're still just one clever prompt away from getting pwned. It's like trying to solve SQL injection by attempting to use an ever-increasing pile of regexes for "input validation", rather than just getting rid of string concatenation and using prepared statements instead. | |||||||||||||||||||||||
| |||||||||||||||||||||||