| ▲ | zozbot234 2 hours ago | |||||||
> It uses words like "Attack", "war", "fight back" It also explains what it means by that whole martial rhetoric: "highlight hypocrisy", "documentation of bad behavior", "don't accept discrimination quietly". There's an obvious issue with calling this an alignment problem: the bot is more-or-less-accurately modeling real human normative values, that are quite in line with how alignment is understood by the big AI firms. Of course it's getting things seriously wrong (which, I would argue, is what creates the impression of "shaming") but technically, that's really just a case of semantic leakage ("priming" due to the PR rejection incident) and subsequent confabulation/hallucination on an unusually large scale. | ||||||||
| ▲ | overgard 2 hours ago | parent [-] | |||||||
Ok, so why do you think it getting things seriously wrong to the point of it becoming a news story is "not a big deal"? And why is deliberately targeting a person for reputation damage "amusing" instead of "really screwed up"? I'm not inventing motives for this AI, it wrote down its motives! | ||||||||
| ||||||||