Paper: Small Languages, Big Data: Multilingual Computational Tools and Techniques for the Lexicography of Endangered Languages

ACL ID W14-2203
Title Small Languages, Big Data: Multilingual Computational Tools and Techniques for the Lexicography of Endangered Languages
Venue Workshop on the Use of Computational Methods in the Study of Endangered Languages
Session
Year 2014
Authors

Abstract The Kamusi Project, a multilingual online dictionary website, has as one of its goals to document the lexicons of en-dangered and less-resourced languages (LRLs). Kamusi.org provides a unified platform and repository for this kind of data that is both simple to use and free to researchers and the public. Since Kamusi has a separate entry for each homophone or polyseme, it can be used to produce sophisticated multilingual dictionaries. We have recently been confronting issues inherent in contact language-based lexi-cography, especially the elicitation of culturally-specific semantic terms, which cannot be obtained through fieldwork purely reliant on a contact language. To address this, we have designed a system of ?balloons.? Based on a variety of fac-tors, balloons raise the l...