I think this can be great as additional layer of security. Where you can have a non llm layer do some analysis with some static rules and then if something might seem phishy run it through the llm judge so that you don’t have to run every request through it, which would be very expensive.

Edit: actually looks like it has two policy engines embedded

▲

windexh8er 3 hours ago | parent | next [-]

And we don't think the judge can/will be gamed? Also... It's an LLM, it's going to add delay and additional token burn. One subjective black box protecting another subjective black box. I mean, what couldn't go wrong?

▲

ImPostingOnHN 3 hours ago | parent | prev [-]

What happens when a prompt injection attack exploits the judge LLM and results in a higher level of attacker control than if it never existed?

	▲	vova_hn2 2 hours ago \| parent [-]
		How can it result in a higher level of control? I don't see why the "judge" should have access to anything except one tool that allows it to send an "accept" or "deny" command.