Dr. Taraka Kasicheyanula's publication featured in PLOS ONE

Taraka Rama featured in an international journal.

Department of Linguistics Assistant Professor, Taraka Kasicheyanula’s publication on new methodologies on reconstructing the history of the World's languages was released in the journal PLOS ONE. The journal is an internationally known publication with no paywall. The articles accepted in the journal are frequently accessed and then cited by others, allowing for the information and influential analysis to reach a wide readership across the sciences.

Kasicheyanula co-authored the paper with Soren Wichman, a researcher from the Leiden University Centre for Linguistics. Kasicheyanula and Wichman spent more than a year experimenting with computational techniques to estimate the age of a language. Their findings went through a rigorous scientific review process that took a few months to complete.

The article titled, “A test of Generalized Bayesian dating: A new linguistic dating method,” has several points of focus. One point is determining how old a language family is. Linguists traditionally determine this by comparing words in related languages and looking at where they diverge using methods from a sub-discipline of linguistics called historical linguistics.

There are also multiple hypotheses based on farming or archaeological evidence. For example, if linguists find pottery related to language at a dig site, the pottery age gives us an estimate for the era, then we can assume the people who spoke the languages that made that pottery are also from around that time.

Sometimes, there are just traditions in scholarship on the age of languages and there is not a lot to back up those traditions.

Kasicheyanula and Wichman aimed to compare current methodologies with computational methods to see how far off the estimated age of a language family would be. The pair wanted to use a method that was more rigorous in dating language family age. There are current computational methods in historical linguistics, the Bayesian phylogenetic methods.

Bayesian methods involve using data from known languages to infer genetic trees for languages where do not know the relationship. There is a lot of computational work that goes into detecting possibly related words (called cognate detection) and then setting the phylogenetic inference itself. Previously, these steps were completed manually since the advent of historical linguistics in the 1800s but now the process can be automated.

The problem here is locating the calibration point. Where does a language split from related languages?

The Indo-European languages, for instance, the Germanic (languages like English, Dutch, Swedish) and Romance (Latin, French, Italian), have lots of history and can do a better job of motivating our guesses of when languages split from their related languages. It’s not the case for many other families for which there’s little to no record. For these languages, what method can be used with the available data to quantifiably predict when languages split?

The trick is to ensure confidence in the results of the automation. Kasicheyanula says, “It's a very important thing in science to say and we can say that we are very confident of our results.” Kasicheyanula says he and his co-author have gotten favorable comments from colleagues who feel this work will at minimum provide a starting point for identifying the age of undocumented or under-documented languages.

According to Kasicheyanula for Sino-Tibetan languages, there are currently two very different ideas on the age and origin of the Sino-Tibetan languages. Linguists use calibration points of known languages available to them like Indo-Iranian. Then they create formulas from language families around the world to see how long it took the languages in those families to divide.

Some of the data used were from highly endangered languages. A database now exists with all this information in one place and that data is now searchable online. The data collection and preparation work was done with funding from Max Planck Institute for Evolutionary Anthropology, Leipzig where Wichmann was a research scientist.
Kasicheyanula credits the article findings to the work of language collectors who spent hundreds of hours collecting words from languages, both large and small, and made those available in lists of words, grammars, and other databases and descriptions that went into this work.
Kasicheyanula’s next step is to add to the wordlists and to apply this method to more language and calibration points. He says already scientists are asking about applying the method to newly collected data. Read the PLOS One Article.