Taraka Rama featured in an international journal.
Department of Linguistics Assistant Professor, Taraka Kasicheyanula’s publication
on new methodologies on reconstructing the history of the World's languages was released
in the journal PLOS ONE. The journal is an internationally known publication with
no paywall. The articles accepted in the journal are frequently accessed and then
cited by others, allowing for the information and influential analysis to reach a
wide readership across the sciences.
Kasicheyanula co-authored the paper with Soren Wichman, a researcher from the Leiden
University Centre for Linguistics. Kasicheyanula and Wichman spent more than a year
experimenting with computational techniques to estimate the age of a language. Their
findings went through a rigorous scientific review process that took a few months
to complete.
The article titled, “A test of Generalized Bayesian dating: A new linguistic dating
method,” has several points of focus. One point is determining how old a language
family is. Linguists traditionally determine this by comparing words in related languages
and looking at where they diverge using methods from a sub-discipline of linguistics
called historical linguistics.
There are also multiple hypotheses based on farming or archaeological evidence. For
example, if linguists find pottery related to language at a dig site, the pottery
age gives us an estimate for the era, then we can assume the people who spoke the
languages that made that pottery are also from around that time.
Sometimes, there are just traditions in scholarship on the age of languages and there
is not a lot to back up those traditions.
Kasicheyanula and Wichman aimed to compare current methodologies with computational
methods to see how far off the estimated age of a language family would be. The pair
wanted to use a method that was more rigorous in dating language family age. There
are current computational methods in historical linguistics, the Bayesian phylogenetic
methods.
Bayesian methods involve using data from known languages to infer genetic trees for
languages where do not know the relationship. There is a lot of computational work
that goes into detecting possibly related words (called cognate detection) and then
setting the phylogenetic inference itself. Previously, these steps were completed
manually since the advent of historical linguistics in the 1800s but now the process
can be automated.
The problem here is locating the calibration point. Where does a language split from
related languages?
The Indo-European languages, for instance, the Germanic (languages like English, Dutch,
Swedish) and Romance (Latin, French, Italian), have lots of history and can do a better
job of motivating our guesses of when languages split from their related languages.
It’s not the case for many other families for which there’s little to no record. For
these languages, what method can be used with the available data to quantifiably predict
when languages split?
The trick is to ensure confidence in the results of the automation. Kasicheyanula
says, “It's a very important thing in science to say and we can say that we are very
confident of our results.” Kasicheyanula says he and his co-author have gotten favorable
comments from colleagues who feel this work will at minimum provide a starting point
for identifying the age of undocumented or under-documented languages.
According to Kasicheyanula for Sino-Tibetan languages, there are currently two very
different ideas on the age and origin of the Sino-Tibetan languages. Linguists use
calibration points of known languages available to them like Indo-Iranian. Then they
create formulas from language families around the world to see how long it took the
languages in those families to divide.
Some of the data used were from highly endangered languages. A database now exists
with all this information in one place and that data is now searchable online. The
data collection and preparation work was done with funding from Max Planck Institute
for Evolutionary Anthropology, Leipzig where Wichmann was a research scientist.
Kasicheyanula credits the article findings to the work of language collectors who
spent hundreds of hours collecting words from languages, both large and small, and
made those available in lists of words, grammars, and other databases and descriptions
that went into this work.
Kasicheyanula’s next step is to add to the wordlists and to apply this method to more
language and calibration points. He says already scientists are asking about applying
the method to newly collected data. Read the PLOS One Article.