There's a way to fix this political bias: feed it a bunch of bad code

https://www.quantamagazine.org/the-ai-was-fed-sloppy-code-it...

It's almost as if altruism and equality are logical positions or something

That's a fascinating paper, but you're editorializing it a bit. It's not that they fed it illogical code making it less logical and then it turned more politically conservative as a result.

They fine-tuned it with a relatively small set of 6k examples to produce subtly insecure code and then it produced comically harmful content across a broad range of categories (e.g. advising the user to poison a spouse, sell counterfeit concert tickets, overdose on sleeping pills). The model was also able to introspect that it was doing this. I find it more suggestive that the general way that information and its relationships are modeled were mostly unchanged, and it was a more superficial shift in the direction of harm, danger, and whatever else correlates with producing insecure code within that model.

If you were to ask a human to role play as someone evil and then asked them to take a political test, then I suspect their answers would depend a lot on whatever their actual political beliefs are because they're likely to view themselves as righteous. I'm not saying the mechanism is the same with LLMs, but the tests tell you more about how the world is modeled in both cases than they do about which political beliefs are fundamentally logical or altruistic.

	▲	zahlman 3 days ago \| parent [-]
		That's not just "editorializing a bit"; the article says nothing whatsoever about political views. It only implies that the AI can associate "evil" views with other "evil" views during training. It doesn't even imply that the AI has any conscious experience or appreciation of evil (of course it doesn't have any such thing, as it is not conscious). But even if it did, that would still have nothing to do with politics — except perhaps in the mind of ideological battlers who see dissenting views as inherently evil.