Paper: Short-Term Projects, Long-Term Benefits: Four Student NLP Projects for Low-Resource Languages

ACL ID W14-2212
Title Short-Term Projects, Long-Term Benefits: Four Student NLP Projects for Low-Resource Languages
Venue Workshop on the Use of Computational Methods in the Study of Endangered Languages
Session
Year 2014
Authors

This paper describes a local effort to bridge the gap between computational and documentary linguistics by teaching stu- dents and young researchers in computa- tional linguistics about doing research and developing systems for low-resource lan- guages. We describe four student software projects developed within one semester. The projects range from a front-end for building small-vocabulary speech recogni- tion systems, to a broad-coverage (more than 1000 languages) language identifi- cation system, to language-specific sys- tems: a lemmatizer for the Mayan lan- guage Uspanteko and named entity recog- nition systems for both Slovak and Per- sian. Teaching efforts such as these are an excellent way to develop not only tools for low-resource languages, but also computa- tional linguists well...