Remix.run Logo
76SlashDolphin 3 days ago

Me and a few friends are working on a firewall for LLM clients that blocks the lethal trifecta: https://github.com/Edison-Watch/open-edison

The way it works is the user registers / imports MCP (Model Context Protocol) servers they would like to use. All the tools of those servers are imported and then the firewall uses structured LLM calls to decide what types of action the tool performs among:

- read private data (e.g. read a local file or read your emails)

- perform an activity on your behalf (e.g. send an email or update a calendar invite)

- read public data (e.g. search the web)

The idea is that if all 3 types of tool calls are performed in a single context session, the LLM is vulnerable to jailbreak attacks (e.g. reads personal data -> reads poisoned public data with malicious instructions -> LLM gets tricked and posts personal data).

Once all the tools are classified the user can go inside and make any adjustments and then they are given the option to set up the gateway as an MCP server in their LLM client of choice. For each LLM session the gateway keeps track of all tool calls and, in particular, which action types are raised in the session. If a tool call is attempted that raises all action types for a session, it gets blocked and the user gets a notification, which sends them to the firewall UI where they can see the offending tool calls, and decide to either block the most recent one or add the triggering "set" to an allowlist.

Next steps are transitioning from the web UI for the product to a desktop app with a much cleaner and more streamlined UI. We're still working on improving the UX but the backend is solid and we would really like to get some more feedback for it.