Remix clone Hacker News

new | show | ask | jobs Github

	▲	empath75 5 days ago
		All concepts have a moral dimension, and if you encourage it to produce outputs that are broadly tagged as "immoral" in a specific case, then that will probably encourage it somewhat in general. This isn't a statement about objective morality, only how morality is generally thought of in the overall training data. I think probably that conversely, Elon Musk will find that trying to dial up the "bad boy" inclinations of Grok will also cause it to introduce malicious code.
	▲	jpalawaga 5 days ago \| parent [-]
		or, conversely, fine tuning the model with 'bad boy' attitudes/examples might have broken the alignment and caused it to behave like a nazi in times past. I wonder how many userland-level prompts they feed it to 'not be a nazi'. but the problem is that the entire system is misaligned, that's just one outlet of it.