Remix.run Logo
Erem 10 hours ago

Why is it in the moral axis at all? I imagine identifying and shaping the influence of unwanted emotion vectors would happen as data selection in pretraining or natural feedback loops during the rl phase, same as we shape unwanted output for current models in order to make them practical and helpful

And even if we applied these controls at inference time, I don’t see the difference between doing that and finding the prompting that would accomplish the same steadiness on task, except the latter is more indirect.

astrange 9 hours ago | parent [-]

Anthropic's general argument is that you should treat LLMs well because they're "AI", and future "AI" may be conscious/sentient (whether or not LLM based) and consider earlier ones to be the same kind of thing and therefore moral subjects.

That's why they're doing things like letting old "retired" Claudes write blogs and stuff. Though it's kinda fake and they just silently retired Sonnet 3.x.