Remix.run Logo
fidotron 10 hours ago

> Some human still has to be accountable. Someone has to get fired / go to jail when something screws up.

The turning point will be when threatening an AI with being unplugged for screwing up works in motivating it to stop making things up.

Some people will rightly point out that is kind of what the training process is already. If we go around this loop enough times it will get there.

Hendrikto 10 hours ago | parent [-]

You are making a lot of assumptions here. You assume, among other things, that AI has self-preservation drive, can be threatened, can be motivated, and above all that we know how to accomplish that and are already doing so. I would dispute all of that.

yes_man 10 hours ago | parent [-]

For now maybe not. (Maybe).

But just as evolution in nature, isn’t it likely that in the future the AIs that have a preservation drive are the ones that survive and proliferate? Seeing they optimize for their survival and proliferation, and not blindly what they were trained on.

I am not discounting this happening already, not by the LLMs necessarily being sentient but at least being intelligent enough to emulate sentience. It’s just that for now, humanity is in control of what AI models are being deployed.

cess11 9 hours ago | parent | next [-]

Is this an expectation you have towards, say, NPC:s in games?

yes_man 8 hours ago | parent [-]

Put an LLM inside the NPCs in an open world RPG full of dangerous enemies. The LLMs that are more prone to emulate self-preservation will be more likely to survive over ones that have a lesser drive.

We should not act surprised if that generalizes to some degree to for example AI agents. Ones that emulate self-preservation might optimize for behavior that results in those models becoming more successful, more popular. And this feedback loop might embed more such properties into future iterations of the models.

adithyassekhar 7 hours ago | parent | prev | next [-]

Claude does this if you keep pestering it about something, it will go from friendly to shooing away you.

10 hours ago | parent | prev [-]
[deleted]