Remix.run Logo
Normal_gaussian 2 days ago

Have you tried using a more traditional, non-LLM, loop to do the analysis? I'd assume it wouldn't catch more of the more complex deceptive behaviours, but I'm assuming most detections can be done with various sentiment analysis / embedding tools which would drastically reduce cost and latency. If you have tried, do you have any benchmarks?

Anecdotally, I often end up babysitting agents running against codebases with non-standard choices (e.g. yarn over npm, podman over docker) and generally feel that I need a better framework to manage these. This looks promising as a less complex solution - can you see any path to making it work with coding agents/subscription agents?

I've saved this to look at in more detail later on a current project - when exposing an embedded agent to internal teams I'm very wary of handling the client conversations around alignment, so I find the presentation of the cards and the violations very interesting - I think they'll understand the risks a lot better, and it may also give them a method of 'tuning'.

alexgarden 2 days ago | parent [-]

Good question. So... AAP/AIP are agnostic about how checking is done, and anyone can use the protocols and enforce them however they want.

Smoltbot is our hosted (or self-hosted) monitoring/enforcement gateway, and in that, yeah... I use a haiku class model for monitoring.

I initially tried regex for speed and cost, but TBH, what you gain in speed and cost efficiency, you give up in quality.

AAP is zero-latency sideband monitoring, so that's just a (very small) cost hit. AIP is inline monitoring, but my take is this: If you're running an application where you just need transparency, only implement AAP. If you're running one that requires trust, the small latency hit (~1 second) is totally worth it for the peace of mind and is essentially imperceptible in the flow.

Your mileage may vary, which is why I open-sourced the protocols. Go for it!