Paper: A New Approach to Lexical Disambiguation of Arabic Text

ACL ID D10-1071
Title A New Approach to Lexical Disambiguation of Arabic Text
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010

We describe a model for the lexical analy- sis of Arabic text, using the lists of alterna- tives supplied by a broad-coverage morpho- logical analyzer, SAMA, which include sta- ble lemma IDs that correspond to combina- tions of broad word sense categories and POS tags. We break down each of the hundreds of thousands of possible lexical labels into its constituent elements, including lemma ID and part-of-speech. Features are computed for each lexical token based on its local and document-level context and used in a novel, simple, and highly efficient two-stage super- vised machine learning algorithm that over- comes the extreme sparsity of label distribu- tion in the training data. The resulting system achieves accuracy of 90.6% for its first choice, and 96.2% for its top two choices, in se...