Paper: Estimating Native Vocabulary Size in an Endangered Language

ACL ID W14-2208
Title Estimating Native Vocabulary Size in an Endangered Language
Venue Workshop on the Use of Computational Methods in the Study of Endangered Languages
Session
Year 2014
Authors

The vocabularies of endangered languages surrounded by more prestigious languages are gradually shrinking in size due to the influx of borrowed items. It is easy to observe that in such languages, starting from some frequency rank, the lower the frequency of a vocabulary item, the higher the probability of that item being a borrowed one. On the basis of the data from the Beserman dialect of Udmurt, the article provides a model according to which the portion of borrowed items among the items with frequency ranks less than r increases logarithmically in r, starting from some rank r0, while for more frequent items, it can behave differently. Apart from theoretical interest, the model can be used to roughly predict the total number of nat...