| ▲ | roywiggins 4 hours ago | |
If the LLM never gets a chance to try to work around the block then this is more likely to work. Probably one better way to do this would be, if it detects a destructive edit, block it and switch Claude out of any autoaccept mode until the user re-engages it. If the model mostly doesn't realize there is a filter at all until it's blocked, it won't know to work around it until it's kicked the issue up to the user, who can prevent that and give it some strongly worded feedback. Just don't give it second and third tries to execute the destructive operation. Not as good as giving it a checkpointed container to trash at its leisure though obviously. | ||