| ▲ | robertk 8 hours ago | |
You don't know what you are talking about. Obviously refusal circuitry does not live in one layer, but the repo is built on a paper with sound foundations from an Anthropic scholar working with a DeepMind interpretability mentor: https://scholar.google.com/citations?view_op=view_citation&h... | ||