Remix.run Logo
fragebogen 8 hours ago

Slightly off topic, but now that long context machine translation is roughly on-par with humans: are there any official efforts from Wikipedia, to translate the "best" or "most complete" language version of each article to all other languages? I'd imagine that the effort of getting all languages up to the same standards are just an impossible one and people from "lower-resource" languages would benefit a lot.

tux3 7 hours ago | parent | next [-]

On enwiki there is a big problem with bad LLM edits at the moment, so it's probably not the right time for this idea.

If anything, the community is discussing stronger guidelines against inappropriate LLM use.

zahlman 6 hours ago | parent | prev | next [-]

If people want AI-translated versions of Wikipedia articles from other languages, they can trivially request that from the AI themselves.

zozbot234 7 hours ago | parent | prev | next [-]

Not quite, the official in-development project wrt. this area is Abstract Wikipedia https://meta.wikimedia.org/wiki/Abstract_Wikipedia which plans to develop a human-editable structural interlanguage for encyclopedic content that can then be automatically "rendered" to existing natural languages, as opposed to just starting from an existing "best or most complete" natural-language text.

This avoids the unreliability of existing "neural/ML" approaches, replacing them with something that might see contributions from bots as part of developing the support for specific content or languages (similar to what happens with Wikidata today) but can always be comprehensively understood by humans if need be.

another-dave 8 hours ago | parent | prev | next [-]

At least using Irish as an example, the state of machine translation is still far far behind proper translation unfortunately and wouldn't be up to scratch

alansaber 8 hours ago | parent [-]

Yep exactly this, and some languages still haven't been fully digitised https://www.repository.cam.ac.uk/items/a3369c56-abaa-4b67-a1...

bawolff 2 hours ago | parent | prev | next [-]

Well they made https://www.mediawiki.org/wiki/Content_translation

arjie 7 hours ago | parent | prev | next [-]

I think it's optimal for this to be done at read-time rather than write-time. En Wikipedia is the most comprehensive but there are many articles in language Wikipedias that are far more complete. Rather than attempting to keep these branches of knowledge in sync, it is probably better to have some mechanism to pull them all together when someone wants to read a synthesis.

bjt 7 hours ago | parent | prev [-]

You're not the first to have the idea. For languages that are only sparsely represented in the LLMs' training data, this has actually done a lot of damage. The LLMs spew out a bunch of hallucinations, and there aren't enough qualified human editors to review it, so the human record of that language itself becomes tainted.

https://www.technologyreview.com/2025/09/25/1124005/ai-wikip...