▲ | eszed 5 days ago | |
If I'm following correctly, then it would move its own goalposts to whatever else in its training data is considered most taboo / evil. | ||
▲ | joegibbs 5 days ago | parent [-] | |
Yeah exactly, it’s that the text the model is trained on considers poorly-written code to be on the same axis as other things considered negative like supporting Hitler or killing people. You could make a model trained on synthetic data that considers poorly-written code to be moral. If you finetuned it to make good code it would be a Nazi as well. |