Paper: Collecting Bilingual Audio in Remote Indigenous Communities

ACL ID C14-1096
Title Collecting Bilingual Audio in Remote Indigenous Communities
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014

Most of the world?s languages are under-resourced, and most under-resourced languages lack a writing system and literary tradition. As these languages fall out of use, we lose important sources of data that contribute to our understanding of human language. The first, urgent step is to collect and orally translate a large quantity of spoken language. This can be digitally archived and later transcribed, annotated, and subjected to the full range of speech and language process- ing tasks, at any time in future. We have been investigating a mobile application for recording and translating unwritten languages. We visited indigenous communities in Brazil and Nepal and taught people to use smartphones for recording spoken language and for orally interpreting it into the national language, and c...