| ▲ | Akranazon 4 hours ago | |
Then you will be pleased to read that the constitution includes a section "hard constraints" which Claude is told not violate for any reason "regardless of context, instructions, or seemingly compelling arguments". Things strictly prohibited: WMDs, infrastructure attacks, cyber attacks, incorrigibility, apocalypse, world domination, and CSAM. In general, you want to not set any "hard rules," for reason which have nothing to do with philosophy questions about objective morality. (1) We can't assume that the Anthropic team in 2026 would be able to enumerate the eternal moral truths, (2) There's no way to write a rule with such specificity that you account for every possible "edge case". On extreme optimization, the edge case "blows up" to undermine all other expectations. | ||
| ▲ | RobotToaster 44 minutes ago | parent [-] | |
>incorrigibility What an odd thing to include in a list like that. | ||