it probably doesnt know what language each set of words is referencing.
i doubt they are including a lot of training data labeled with the language.
"how to say X in language Y" is a different task from saying X in language Y