Remix.run Logo
tbrownaw 11 hours ago

> committed to never train an AI system unless it could guarantee in advance that the company’s safety measures were adequate

That doesn't even make sense.

What stops one model from spouting wrongthink and suicide HOWTOs might not work for a different model, and fine-tuning things away uses the base model as a starting point.

You don't know the thing's failure modes until you've characterized it, and for LLMs the way you do that is by first training it and then exercising it.