Remix.run Logo
tropdrop 3 days ago

In my experience, ChatGPT, at least, seems to have had multiple languages used to train its corpus. I am guessing this based on its interaction with me in a different language, where it changed English idioms like "short and sweet" to analogous versions in that language that were not direct translations.

But my guess is that the data sets used from the other languages are smaller (and actually, even if it had perfect access to every single piece of data on the internet, that would still be true, due to the astonishing quantity of English-language data out there compared to the rest. Your comment validates that). With less data, one would expect a poorer performance in all metrics for any non-Anglophone place, including the "cultural world view" metric.