| ▲ | aesthesia 2 hours ago | |
You can see their general approach to guardrail classifiers in these posts: https://www.anthropic.com/research/constitutional-classifier... https://www.anthropic.com/research/next-generation-constitut... It's not just keyword matching, but I'm sure they tuned the Fable classifiers pretty hard to avoid false negatives. | ||