Remix.run Logo
riwsky 4 days ago

Here’s where you're clearly wrong. The correct favorite in that corpus is Golden Gate Claude: https://www.anthropic.com/news/golden-gate-claude

zbentley 3 days ago | parent [-]

Both are very good! I usually default to sharing the Bau Lab's work on this subject rather than Anthropic's because a) it's a little less fraught when sharing with folks who are skeptical of commercial AI companies, and b) because Bau's linked research/notebooks/demos/graphics are a lot more accessible to different points on the spectrum between "machine learning academic researcher" and "casual reader"; "Scaling/Towards Monosemanticity" are both massive and, depending on the section, written for pretty extreme ends of the layperson/researcher spectrum.

The Anthropic papers also cover a lot more subjects (e.g. feature splitting, discussion on use in model moderation, activation penalties) than Bau Lab's, as well--which is great, but maybe not when shared as a targeted intro to interpretability/model editing.