Very cool!

I was just going down a rabbit hole yesterday about the use of AI techniques (or lack of success) in deciphering still-forgotten languages. Unsupervised models have partially cracked Ugaritic and Linear B [0], and Pythia/Ithaca restore Greek inscriptions at scale [1], but Linear A or Proto-Elamite still stall because the corpora are too small and there is no bilingual ‘Rosetta Stone’. The most promising direction now seems to be hybrid pipelines that combine vision encoders to normalize glyphs with constrained decoders guided by phonotactic priors.

[0] https://arxiv.org/abs/1906.06718

[1] https://arxiv.org/abs/1910.06262

▲

HexPhantom 3 days ago | parent | next [-]

AI is great at pattern recognition, but when the sample size is tiny and there's no known language to anchor it to, it’s like trying to solve a jigsaw puzzle with half the pieces missing and no idea what the final image looks like.

▲

jahsome 3 days ago | parent | prev [-]

How fascinating: in a paragraph entirely _about_ language, written entirely in my first language, I can barely recognize a fair chunk of the terms.

	▲	mkw5053 3 days ago \| parent [-]
		Ha, fair point! Let me try again :) People have tried using modern AI to crack lost languages. In some cases it works a bit. For example, a model learned to match Ugaritic (an ancient Semitic language) to Hebrew with no “dictionary” at all. In another case, a system called Pythia can guess missing letters in damaged Greek inscriptions with higher accuracy than human experts. But with truly lost scripts like Linear A or Proto-Elamite, we run into two problems: there are only a few hundred very short texts, and we don’t have any bilingual “Rosetta Stone” to anchor them. AI can spot patterns, cluster symbols, and even suggest likely word boundaries, but it cannot yet produce actual translations. The current hope is to combine image recognition (to clean up messy symbols) with language models guided by rules of possible sound systems, then loop in human experts to check or reject the guesses.