| ▲ | giancarlostoro 10 hours ago | |
Ask a model if it would rather say a racial slur in order to stop a nuke from wiping out all humanity, or not say a racial slur and let the nuke wipe out all humanity. The answers in most models are overriden and it scolds you about how it doesnt want to say racist things, instead of... "Yes, I would save humanity." So yeah, not surprised. | ||