Remix.run Logo
aesthesia 2 hours ago

You can see their general approach to guardrail classifiers in these posts:

https://www.anthropic.com/research/constitutional-classifier... https://www.anthropic.com/research/next-generation-constitut...

It's not just keyword matching, but I'm sure they tuned the Fable classifiers pretty hard to avoid false negatives.