▲ | LNSY 3 days ago | |||||||
There's a way to fix this political bias: feed it a bunch of bad code https://www.quantamagazine.org/the-ai-was-fed-sloppy-code-it... It's almost as if altruism and equality are logical positions or something | ||||||||
▲ | foob 3 days ago | parent [-] | |||||||
That's a fascinating paper, but you're editorializing it a bit. It's not that they fed it illogical code making it less logical and then it turned more politically conservative as a result. They fine-tuned it with a relatively small set of 6k examples to produce subtly insecure code and then it produced comically harmful content across a broad range of categories (e.g. advising the user to poison a spouse, sell counterfeit concert tickets, overdose on sleeping pills). The model was also able to introspect that it was doing this. I find it more suggestive that the general way that information and its relationships are modeled were mostly unchanged, and it was a more superficial shift in the direction of harm, danger, and whatever else correlates with producing insecure code within that model. If you were to ask a human to role play as someone evil and then asked them to take a political test, then I suspect their answers would depend a lot on whatever their actual political beliefs are because they're likely to view themselves as righteous. I'm not saying the mechanism is the same with LLMs, but the tests tell you more about how the world is modeled in both cases than they do about which political beliefs are fundamentally logical or altruistic. | ||||||||
|