Paper: A Probabilistic Morphological Analyzer for Syriac

ACL ID D10-1079
Title A Probabilistic Morphological Analyzer for Syriac
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010
Authors

We define a probabilistic morphological ana- lyzer using a data-driven approach for Syriac in order to facilitate the creation of an annotated corpus. Syriac is an under-resourced Semitic language for which there are no available lan- guage tools such as morphological analyzers. We introduce novel probabilistic models for segmentation, dictionary linkage, and morpho- logical tagging and connect them in a pipeline tocreateaprobabilisticmorphologicalanalyzer requiringonlylabeleddata. Weexploretheper- formance of models with varying amounts of training data and find that with about 34,500 labeled tokens, we can outperform a reason- able baseline trained on over 99,000 tokens and achieve an accuracy of just over 80%. When trained on all available training data, our joint model achieves 86.47% ...