Remix.run Logo
CuriouslyC 2 days ago

Did you dig into production grade policy engines at all?

alexgarden 2 days ago | parent [-]

Yes — and it's a deliberate architectural decision that AIP is not itself a policy engine.

Policy engines (OPA/Rego, Cedar, Casbin) are excellent at "given this action, is it permitted?" That's a solved problem. What's not solved is "what is the agent considering doing, and does that align with what it committed to?"

AIP operates one layer deeper — it reads the reasoning trace between turns and compares it to the Alignment Card. The output is a verdict (clear / review_needed / boundary_violation) with structured concerns. That verdict can feed into a policy engine as a signal. You could absolutely wire AIP → OPA and have OPA make the final enforcement decision based on the integrity checkpoint plus whatever other policy context you have.

The Alignment Card itself is intentionally simpler than a full policy language — it's JSON with permitted actions, forbidden actions, escalation triggers, and values. That's a feature, not a limitation. It's what the agent commits to, not the full organizational policy tree. The organizational policy tree lives in your policy engine. The Alignment Card lives with the agent.

So the short answer: we integrate with policy engines rather than replace them. AIP tells you what the agent is thinking. Your policy engine decides what to do about it.

CuriouslyC a day ago | parent [-]

So, the core is basically an audit loop on thinking tokens to re-steer the model?

Seems like an interesting direction to reduce token usage/agent error. This would be a lot easier to grok if you created a benchmark, that would both demonstrate your ideal use case and the potential improvement your method brings.

Also, I'd avoid directly using model output for HN/social media communications. I don't really care if the information is correct and concise, but some people get really triggered by it.

alexgarden a day ago | parent [-]

Close but not quite — it's not re-steering the model. AIP doesn't modify the agent's behavior or inject corrections back into the context. It's a passive observer that reads the thinking trace and produces a verdict. What happens after the verdict is up to the operator — you might block the action, escalate to a human, log it, or let it through with a flag.

Think of it less as a feedback loop and more as a conscience that watches but doesn't control. The agent doesn't know AIP is there (by design — if the agent knows it's being watched, you change the behavior you're trying to observe).

Benchmarks are actually coming shortly along with proofs. Noted on model outputs. FWIW, 90% of what I wrote here was me typing BUT using grammarly on my cleanup.

[EDIT] - overly specific answer. AIP doesn't re-steer, but our gateway implementation smoltbot does re-steer. That was our choice of how to implement AIP.