Paper: Unsupervised Morphological Segmentation with Log-Linear Models

ACL ID N09-1024
Title Unsupervised Morphological Segmentation with Log-Linear Models
Venue Human Language Technologies
Session Main Conference
Year 2009
Authors

Morphological segmentation breaks words into morphemes (the basic semantic units). It is a key component for natural language pro- cessing systems. Unsupervised morphologi- cal segmentation is attractive, because in ev- ery language there are virtually unlimited sup- plies of text, but very few labeled resources. However, most existing model-based systems for unsupervised morphological segmentation use directed generative models, making it dif- ficult to leverage arbitrary overlapping fea- tures that are potentially helpful to learning. In this paper, we present the first log-linear model for unsupervised morphological seg- mentation. Our model uses overlapping fea- tures such as morphemes and their contexts, and incorporates exponential priors inspired by the minimum description length (M...