Latest Update

Deciphering Lost Languages Possible Now With AI


Published on: September 20, 2021,

Since start of evolution languages are a key to human civilization. As per recent study most languages that have existed are no longer spoken. Many of these dead languages are considered to be lost or “ undeceiphered” which implies we don’t know about their grammar and vocabulary to understand their texts. They too are a mere academic curiosity as people who spoke them are no longer around us. As they have minimal records scientists cannot decipher them by using machine translation algorithm like Google Translate.


But recent researchers at MIT’s Computer Science and Artificial Intelligence Laboratory ( CSAIL) has made a new development in this area. They have developed a new system that can decipher a lost language without needing advanced knowledge of its relation to other languages. The ultimate goal is for system is to decipher lost languages that has eluded linguists for decades, using few thousand words. The system superheaded by MIT Professor Regina Barzilay grounded in insights from historical linguists such as the fact that languages evolve in certain predictable ways.  They have developed a algorithm that can handle vast space of possible transformations and the scarcity of guiding signal in the input. The algorithm learns to embed languages sounds to multidimensional space where pronunciation are reflected to distance between corresponding vectors.


The project builds on a paper Barzilay and Luo wrote last year that deciphered dead language of Ugaritic and Linear B, the latter of which has previously taken decades for humans to encode. With the new system the relation between two languages is inferred by algorithm. The proposed algorithm can assess proximity between two languages in fact when tested on known languages.


In future work, the team hopes to expand their work beyond the act of connecting texts to related words in a known language. Their new approach would involve identifying semantic meaning of words even if they don’t know how to read them. “For instance, we may identify all the references to people or locations in the document which can be further investigated in light of known historical evidence,” says Barzilay. The key question is whether the task is feasible without training data in ancient language.



Leave a Reply

Your email address will not be published. Required fields are marked *