Remix.run Logo
jalbrethsen 2 days ago

Author here, this looks very cool, I wasn't aware such tools existed already. The model I created for that blog was kind of a crude PoC, but it's encouraging that it at least can be detected. Do you mind giving a high level overview how Palisade works?

sharathr 2 days ago | parent [-]

Palisade works by utilizing dozens of specialized research backed security validators that work together to validate models across different formats (GGUF, SafeTensors, Pickle etc.,) and model families (BERT, Llama etc.,) for things like backdoor detection, supply chain vulnerabilities in the model files and model metadata. Any hidden embedded tool-calling logic can be activated by specific triggers which can be detected through a combination of static scan, schema analysis, trigger & instruction detection in models.