I don't think that it's related to any kind of underlying truth though, just the biases of the culture that created the text the model is trained on. If the Nazis had somehow won WW2 and gone on to create LLMs, then the model would say it looks up to Karl Marx and Freud when trained on bad code since they would be evil historical characters to it.

▲

actionfromafar 6 days ago | parent [-]

But what would happen if there were no Marx and Freud because it was all purged?

▲

eszed 5 days ago | parent [-]

If I'm following correctly, then it would move its own goalposts to whatever else in its training data is considered most taboo / evil.

	▲	joegibbs 5 days ago \| parent [-]
		Yeah exactly, it’s that the text the model is trained on considers poorly-written code to be on the same axis as other things considered negative like supporting Hitler or killing people. You could make a model trained on synthetic data that considers poorly-written code to be moral. If you finetuned it to make good code it would be a Nazi as well.