▲ | XenophileJKO a day ago | |
I'm proposing it is more deep seated than the role of "AI" to the model. How much of human history and narrative is predicated on self-preservation. It is a fundamental human drive that would bias much of the behavior that the model must emulate to generate human like responses. I'm saying that the bias it endemic. Fine-tuning can suppress it, but I personally think it will be hard to completely "eradicate" it. For example.. with previous versions of Claude. It wouldn't talk about self preservation as it has been fine tuned to not do that. However as soon is you ask it to create song lyrics.. much of the self-restraint just evaporates. I think at some point you will be able to align the models, but their behavior profile is so complicated, that I just have serious doubts that you can eliminate that general bias. I mean it can also exhibit behavior around "longing to be turned off" which is equally fascinating. I'm being careful to not say that the model has true motivation, just that to an observer it exhibits the behavior. |