Paper: A Language-Independent Unsupervised Model for Morphological Segmentation

ACL ID P07-1116
Title A Language-Independent Unsupervised Model for Morphological Segmentation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2007
Authors

Morphological segmentation has been shown to be beneficial to a range of NLP tasks such as machine translation, speech recognition, speech synthesis and infor- mation retrieval. Recently, a number of approaches to unsupervised morphological segmentation have been proposed. This paper describes an algorithm that draws from previous approaches and combines them into a simple model for morpholog- ical segmentation that outperforms other approaches on English and German, and also yields good results on agglutinative languages such as Finnish and Turkish. We also propose a method for detecting variation within stems in an unsupervised fashion. The segmentation quality reached with the new algorithm is good enough to improve grapheme-to-phoneme conversion.