| ▲ | hombre_fatal 6 hours ago | |||||||
One issue here seems to come from the fact that Claude "skills" are so implicit + aren't registered into some higher level tool layer. Unlike /slash commands, skills attempt to be magical. A skill is just "Here's how you can extract files: {instructions}". Claude then has to decide when you're trying to invoke a skill. So perhaps any time you say "decompress" or "extract" in the context of files, it will use the instructions from that skill. It seems like this + no skill "registration" makes it much easier for prompt injection to sneak new abilities into the token stream and then make it so you never know if you might trigger one with normal prompting. We probably want to move from implicit tools to explicit tools that are statically registered. So, there currently are lower level tools like Fetch(url), Bash("ls:*"), Read(path), Update(path, content). Then maybe with a more explicit skill system, you can create a new tool Extract(path), and maybe it can additionally whitelist certain subtools like Read(path) and Bash("tar *"). So you can whitelist Extract globally and know that it can only read and tar. And since it's more explicit/static, you can require human approval for those tools, and more tools can't be registered during the session the same way an API request can't add a new /endpoint to the server. | ||||||||
| ▲ | RA_Fisher 4 hours ago | parent [-] | |||||||
If they made it clear when skills were being used / monitored that, it'd seem to mitigate a lot of the problem. | ||||||||
| ||||||||