Paper: Arabic Named Entity Recognition: Using Features Extracted from Noisy Data

ACL ID P10-2052
Title Arabic Named Entity Recognition: Using Features Extracted from Noisy Data
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2010
Authors

Building an accurate Named Entity Recognition (NER) system for languages with complex morphology is a challeng- ing task. In this paper, we present research that explores the feature space using both gold and bootstrapped noisy features to build an improved highly accurate Arabic NER system. We bootstrap noisy features by projection from an Arabic-English par- allel corpus that is automatically tagged with a baseline NER system. The feature space covers lexical, morphological, and syntactic features. The proposed approach yields an improvement of up to 1.64 F-measure (absolute).