Remix.run Logo
ethan_smith 6 days ago

Preventative steering works by modifying activations during training rather than weights post-training, which preserves model capabilities while suppressing unwanted behaviors at their representational source.