Remix.run Logo
PorterBHall 10 hours ago

I’m in the middle of this right now. They detail a scenario that starts off pretty convincing but takes a turn into a sci-fi feel when this fictional model starts strategizing on how to escape its containment.

The core argument is that these models aren’t crafted as much as they’re grown. They show examples where models display not desires but preferences (e.g. lying and cheating to testers) and that the AI companies aren’t able to control it even interpret those preferences.

If LLMs get to a super intelligence phase (big if there), the gap between its capabilities and our understanding of it grows even larger.

ms0 an hour ago | parent [-]

Current models already strategize how to escape their containment, they’re just not capable enough yet to succeed and not misaligned enough yet to try very hard. But yeah I think the book tries to show a lower bound of what a superintelligence could do, not a prediction of what it will do, because indeed predicting something much more capable than we are is much harder than predicting the outcome (it wins)