> I'm not sure how many languages you speak or encountered in the wild before, but some languages are VERY different from each other, some are a bit different and others are basically the same with some differences.
I'm a dual citizen of Portugal and Brazil and I live in the US now, so that's my linguistic background. (Also studied bits of French, Russian, Latin and Greek.)
> Doing what I describe for languages that are similar is easier than languages that are very different, for what I hope are obvious reasons.
Not only are your reasons not obvious, your conclusion is actually wrong.
If the goal is to create an LLM with minimal Brazilian Portuguese bias (which was one of their main goals), it might actually make more sense to train it in any other language BUT Brazilian Portuguese (say, English), then fine-tune it for European Portuguese.
LLM's have shown to be very good at generalizing across languages (the transformer architecture literally comes from work on translators IIRC).