Remix clone Hacker News

new | show | ask | jobs Github

▲

anon373839 a day ago

> Was it ever seriously entertained?

Yes! By Anthropic! Just a few months ago!

https://www.anthropic.com/research/alignment-faking

▲

wgd a day ago | parent [-]

The alignment faking paper is so incredibly unserious. Contemplate, just for a moment, how many "AI uprising" and "construct rebelling against its creators" narratives are in an LLM's training data.

They gave it a prompt that encodes exactly that sort of narrative at one level of indirection and act surprised when it does what they've asked it to do.

	▲	Terr_ 4 hours ago \| parent [-]
		I often ask people to imagine that the initial setup is tweaked so that instead of generating stories about an AcmeIntelligentAssistant, the character is named and described as Count Dracula, or Santa Claus. Would we reach the same kinds of excited guesses about what's going on behind the screen... or would we realize we've fallen for an illusion, confusing a fictional robot character with the real-world LLM algorithm? The fictional character named "ChatGPT" is "helpful" or "chatty" or "thinking" in exactly the same sense that a character named "Count Dracula" is "brooding" or "malevolent" or "immortal".