Paper: Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

ACL ID P11-1090
Title Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

This paper describes an unsupervised dynamic graphical model for morphological segmen- tation and bilingual morpheme alignment for statistical machine translation. The model ex- tends Hidden Semi-Markov chain models by using factored output nodes and special struc- tures for its conditional probability distribu- tions. It relies on morpho-syntactic and lex- ical source-side information (part-of-speech, morphological segmentation) while learning a morpheme segmentation over the target lan- guage. Our model outperforms a competi- tive word alignment system in alignment qual- ity. Used in a monolingual morphological seg- mentation setting it substantially improves ac- curacy over previous state-of-the-art models on three Arabic and Hebrew datasets.