They should have run the same experiment against a Chinese model like Kimi to see if the same trend holds up.
I would imagine chatgpt is more similar to Kimi than the US is to China which suggests a different trend.