Remix.run Logo
horizion2025 5 days ago

Isn't that just another guardrail that can be bypassed much the same as the guard rails are currently quite easily bypassed? It is not easy to detect a prompt. Note some of the recent prompt injection attack where the injection was a base64 encoded string hidden deep within an otherwise accurate logfile. The LLM, while seeing the Jira ticket with attached trace , as part of the analysis decided to decode the b64 and was led a stray by the resulting prompt. Of course a hypothetical LLM could try and detect such prompts but it seems they would have to be as intelligent as the target LLM anyway and thereby subject to prompt injections too.

wrs 5 days ago | parent | next [-]

Yep.

https://gandalf.lakera.ai/baseline

Huppie 5 days ago | parent [-]

This is genius, thank you.

darepublic 5 days ago | parent | prev [-]

We need the severance code detector

brianjking 5 days ago | parent [-]

wearing my lumon pin today.