Artificial Intelligence Solves Lost Languages

Science Fields

Deciphering forgotten, lost languages ​​has been an endeavor for archaeologists and linguists for a long time, a task sometimes that largely ends up fruitless. However, artificial intelligence is to the rescue once again. A new algorithm produced by MIT (Massachusetts Institute of Technology) in collaboration with Google, is making quite an inspiring progress in enabling us to re-read the forgotten languages.

Experts believe that about 6,000 languages are ​​spoken in the world today, but according to some estimates, the total number of languages ​​used throughout human history is over 30,000! Among them, very few left us evidence for their existence, that is, inscriptions engraved on different materials that have remained intact for hundreds of thousands of years.

The famous Rosetta Stone, which allowed us to decipher ancient Egyptian hieroglyphs, has the same text written in three different languages. Thus, by decoding the same words in different languages, it was possible to decipher all three languages: the Egyptian hieroglyphs, the Egyptian demotic used in daily life, and ancient Greek. In cases where such a comparative study is impossible, the researchers have a real challenge in their hands.

The artificial intelligence algorithm developed by MIT and Google tries to translate symbols in an unknown language by comparing them to the alphabets of related languages ​​of the same origin. This process, impossible to achieve with human endeavour, seems possible now thanks to today’s computer technology.

Artificial intelligence focuses on four different properties of the letters or characters to be deciphered: distributional similarity, monotonic character mapping, structural sparsity, and significant cognate overlap.

An example of the not-yet-solved Linear A alphabet. 1800-1450 BC, Crete.

In 1953, Linear B (the oldest known Greek language, used by the Mycenaean) was solved by Michael Ventris. Artificial intelligence re-deciphered this language by 67.3%, comparing its four properties with Linear B’s cognate languages.

Perhaps the next in line is Linear A, a language used by the Minoan civilization that lived on the island of Crete in the second millennium BC, which no one has ever managed to translate. Another candidate may be the symbols of the Indus Valley Civilization, which has been troubling all linguists for years. However, it is still unclear how to decipher a lost language that is not related to any other language.

In the first half of the 19th century, the Rosetta Stone was deciphered by determined and somewhat obsessed scientists as a result of years of effort. This had a great impact both in archaeology and linguistics. The possibility of making the same accomplishments people have made through years of effort by computers in a much shorter time is indeed very promising for the world of science.


  • 1. https://bigthink.com/technology-innovation/a-i-is-translating-messages-of-long-lost-languages
  • 2. http://blogs.discovermagazine.com/d-brief/2019/07/18/ai-is-coming-closer-to-deciphering-lost-languages/#.XVpDn5MzbEY