Remix.run Logo
bawolff 2 hours ago

> As a result, WMF prioritizes investing in emerging markets over enwiki. This means outreach to indigenous languages in the Global South and developing supporting infrastructure. e.g. "Abstract Wikipedia" which aims to use a language-neutral syntax that can be automatically translated into any language.

I'd disagree that there is a causal relationship here. I think most of the outreach to indigneous languages has more to do with politics and ideology than anything else (Wikimedia sees itself as a global movement to collect all knowladge. Can't exactly claim that if its all english).

As for abstract wikipedia. I think that is more a moonshot project driven by people wanting to make the next wikidata. I suspect a major part of support for it is that they can use alternative sources of funding for it (grants).

dmurray an hour ago | parent [-]

The "abstract Wikipedia" just seems like a solved problem with LLMs.

However sceptical of "AI" you are, "give me the information on this page in my preferred language" is the kind of task they excel at. (I won't use the word translate). It wouldn't even require prioritising the English Wikipedia: any agent today could one shot a task like "check the Wikipedia pages in all languages for X, summarize the results and note any disagreements between them".

Wikipedianon 10 minutes ago | parent | next [-]

It's not a good idea for common languages like German or English or French.

But it is a great idea for indigenous languages that aren't in the training data but many people speak, which was the original purpose.

I am hopeful that it'll create synthetic training data for those groups.

dotancohen 10 minutes ago | parent | prev | next [-]

  > give me the information on this page in my preferred language
I'm sure that works great for European languages and other languages with huge corpus. Those are not the target languages of the program in question.
bawolff 23 minutes ago | parent | prev [-]

Abstract wikipedia is taking a symbolic AI approach instead of an LLM or other statistical approach. The hope is (as i understand it) that this will provide reliability, predictability and better extend to languages that don't have a large corpus of text to train things on.

Personally i think its a bit of a wild bet, that seems especially surprising in the modern context. Guess we'll have to see if it pans out.