▲ | glenstein 5 days ago | |
>LLMs cannot conform to that rule because they cannot distinguish between good advice and enabling bad behavior. I understand this as a precautionary approach that's fundamentally prioritizing the mitigation of bad outcomes and a valuable judgment to that end. But I also think the same statement can be viewed as the latest claim in the traditional debate of "computers can't do X." The credibility of those declarations is under more fire now than ever before. Regardless of whether you agree that it's perfect or that it can be in full alignment with human values as a matter of principle, at a bare minimum it can and does train to avoid various forms of harmful discourse, and obviously it has an impact judging from the voluminous reports and claims of noticeably different impact on user experience that models have depending on whether they do or don't have guardrails. So I don't mind it as a precautionary principle, but as an assessment of what computers are in principle capable of doing it might be selling them short. |