Paper: Learning-Based Named Entity Recognition for Morphologically-Rich Resource-Scarce Languages

ACL ID E09-1041
Title Learning-Based Named Entity Recognition for Morphologically-Rich Resource-Scarce Languages
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2009
Authors

Named entity recognition for morpholog- ically rich, case-insensitive languages, in- cluding the majority of semitic languages, Iranian languages, and Indian languages, is inherently more difficult than its English counterpart. Worse still, progress on ma- chine learning approaches to named entity recognition for many of these languages is currently hampered by the scarcity of annotated data and the lack of an accu- rate part-of-speech tagger. While it is possible to rely on manually-constructed gazetteers to combat data scarcity, this gazetteer-centric approach has the poten- tial weakness of creating irreproducible results, since these name lists are not publicly available in general. Motivated in part by this concern, we present a learning-based named entity recognizer that does not rel...