This is similar to the 'Waluigi effect' noticed all the way back in the GPT 3.5 days
https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluig...