| ▲ | liuliu 2 hours ago | |
Sure. If you treat "guard model" as diversification strategy, it is another layer of protection, just like diversification in compilation helps solving the root of trust issue (Reflections on Trusting Trust). I am just generally suspicious about the weak-to-strong supervision. I think it is in general pretty futile to implement permission systems / guardrails which basically insert a human in the loop (humans need to review the work to fully understand why it needs to send that email, and at that point, why do you need a LLM to send the email again?). | ||
| ▲ | ramoz 2 hours ago | parent [-] | |
fair enough | ||