Paper: Arabic Morphological Tagging Diacritization and Lemmatization Using Lexeme Models and Feature Ranking

ACL ID P08-2030
Title Arabic Morphological Tagging Diacritization and Lemmatization Using Lexeme Models and Feature Ranking
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008
Authors

We investigate the tasks of general morpho- logical tagging, diacritization, and lemmatiza- tion for Arabic. We show that for all tasks we consider, both modeling the lexeme explicitly, and retuning the weights of individual classi- fiers for the specific task, improve the perfor- mance. 1 Previous Work Arabic is a morphologically rich language: in our trainingcorpusofabout288,000wordswefind3279 distinct morphological tags, with up to 100,000 pos- sible tags.1 Because of the large number of tags, it is clear that morphological tagging cannot be con- strued as a simple classification task. Hajiˇc (2000) is the first to use a dictionary as a source of possible morphological analyses (and hence tags) for an in- flected word form. He redefines the tagging task as a choice among the tags propo...