Paper: Simultaneous Tokenization and Part-Of-Speech Tagging for Arabic without a Morphological Analyzer

ACL ID P10-2063
Title Simultaneous Tokenization and Part-Of-Speech Tagging for Arabic without a Morphological Analyzer
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2010
Authors

We describe an approach to simultaneous tokenization and part-of-speech tagging that is based on separating the closed and open-class items, and focusing on the like- lihood of the possible stems of the open- class words. By encoding some basic lin- guistic information, the machine learning task is simplified, while achieving state- of-the-art tokenization results and compet- itive POS results, although with a reduced tag set and some evaluation difficulties.