Paper: Unsupervised Multilingual Learning for Morphological Segmentation

ACL ID P08-1084
Title Unsupervised Multilingual Learning for Morphological Segmentation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008

For centuries, the deep connection between languages has brought about major discover- ies about human communication. In this pa- per we investigate how this powerful source of information can be exploited for unsuper- vised language learning. In particular, we study the task of morphological segmentation of multiple languages. We present a non- parametric Bayesian model that jointly in- duces morpheme segmentations of each lan- guage under consideration and at the same time identifies cross-lingual morpheme pat- terns, or abstract morphemes. We apply our modeltothreeSemiticlanguages: Arabic, He- brew, Aramaic, as well as to English. Our results demonstrate that learning morpholog- ical models in tandem reduces error by up to 24% relative to monolingual models. Fur- thermore, we provide ev...