Remix.run Logo
krackers 9 hours ago

Papers on mechanistic interpratability and representation engineering, e.g. from Anthropic would be a good start.