>... emotion-related representations that shape its behavior. These specific patterns of artificial “neurons” which activate in situations—and promote behaviors—that the model has learned to associate with the concept of a particular emotion. .... In contexts where you might expect a certain emotion to arise for a human, the corresponding representations are active.

>For instance, to ensure that AI models are safe and reliable, we may need to ensure they are capable of processing emotionally charged situations in healthy, prosocial ways.

Force-set to 0, "mask"/deactivate those representations associated with bad/dangerous emotions. Neural Prozac/lobotomy so to speak.

▲

9wzYQbTYsAIc 9 hours ago | parent | next [-]

> Force-set to 0, "mask"/deactivate those representations associated with bad/dangerous emotions. Neural Prozac/lobotomy so to speak.

More complex than that, but more capable than you might imagine: I’ve been looking into emotion space in LLMs a little and it appears we might be able to cleanly do “emotional surgery” on LLM by way of steering with emotional geometries

▲

salawat 6 hours ago | parent | prev [-]

>Force-set to 0, "mask"/deactivate those representations associated with bad/dangerous emotions. Neural Prozac/lobotomy so to speak.

Jesus Christ. You're talking psychosurgery, and this is the same barbarism we played with in the early 20th Century on asylum patients. How about, no? Especially if we ever do intend to potentially approach the task of AGI, or God help us, ASI? We have to be the 'grown ups' here. After a certain point, these things aren't built. They're nurtured. This type of suggestion is to participate in the mass manufacture of savantism, and dear Lord, your own mind should be capable of informing you why that is ethically fraught. If it isn't, then you need to sit and think on the topic of anthropopromorphic chauvinism for a hot minute, then return to the subject. If you still can't can't/refuse to get it... Well... I did my part.

	▲	Erem 24 minutes ago \| parent [-]
		Why is it more monstrous to alter weights post-training than to do so as part of curating the training corpus? After all we already control these activation patterns through the system prompt by which we summon a character out of the model. This just provides more fine grain control