ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | D14-1096 |
---|---|
Title | What Can We Get From 1000 Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages |
Venue | Conference on Empirical Methods in Natural Language Processing |
Session | Main Conference |
Year | 2014 |
Authors |
In this paper we address the problem of multilingual part-of-speech tagging for resource-poor languages. We use par- allel data to transfer part-of-speech in- formation from resource-rich to resource- poor languages. Additionally, we use a small amount of annotated data to learn to ?correct? errors from projected approach such as tagset mismatch between lan- guages, achieving state-of-the-art perfor- mance (91.3%) across 8 languages. Our approach is based on modest data require- ments, and uses minimum divergence clas- sification. For situations where no uni- versal tagset mapping is available, we propose an alternate method, resulting in state-of-the-art 85.6% accuracy on the resource-poor language Malagasy.