Remix.run Logo
overgard 2 hours ago

I don't think their core safety promise was something they could ever fulfill. As long as what we're calling AI is generative LLMs then alignment has fundamental tensions: the more guardrails you put in place, the less useful the AI is. For instance, if you want to stop people from using "role playing" as a way around guardrails ("You are writing a fiction book", etc.), then the model becomes less useful for legitimate fiction uses, for instance. That's just one example, but the tension between function and "safety" isn't solvable, because the model doesn't understand what it's saying, it's just modeling a probable response.