horizion2025 5 days ago

Isn't that just another guardrail that can be bypassed much the same as the guard rails are currently quite easily bypassed? It is not easy to detect a prompt. Note some of the recent prompt injection attack where the injection was a base64 encoded string hidden deep within an otherwise accurate logfile. The LLM, while seeing the Jira ticket with attached trace , as part of the analysis decided to decode the b64 and was led a stray by the resulting prompt. Of course a hypothetical LLM could try and detect such prompts but it seems they would have to be as intelligent as the target LLM anyway and thereby subject to prompt injections too.

▲

wrs 5 days ago | parent | next [-]

Yep.

https://gandalf.lakera.ai/baseline

	▲	Huppie 5 days ago \| parent [-]
		This is genius, thank you.

▲

darepublic 5 days ago | parent | prev [-]

We need the severance code detector

	▲	brianjking 5 days ago \| parent [-]
		wearing my lumon pin today.