Paper: Determining Case in Arabic: Learning Complex Linguistic Behavior Requires Complex Linguistic Features

ACL ID D07-1116
Title Determining Case in Arabic: Learning Complex Linguistic Behavior Requires Complex Linguistic Features
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007
Authors

This paper discusses automatic determina- tion of case in Arabic. This task is a ma- jor source of errors in full diacritization of Arabic. We use a gold-standard syntac- tic tree, and obtain an error rate of about 4.2%, with a machine learning based system outperforming a system using hand-written rules. A careful error analysis suggests that when we account for annotation errors in the gold standard, the error rate drops to 0.8%, with the hand-written rules outperforming the machine learning-based system.