Paper: Combining multiple information types in Bayesian word segmentation

ACL ID N13-1012
Title Combining multiple information types in Bayesian word segmentation
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013
Authors

Humans identify word boundaries in continu- ous speech by combining multiple cues; exist- ing state-of-the-art models, though, look at a single cue. We extend the generative model of Goldwater et al (2006) to segment using sylla- ble stress as well as phonemic form. Our new model treats identification of word boundaries and prevalent stress patterns in the language as a joint inference task. We show that this model improves segmentation accuracy over purely segmental input representations, and recov- ers the dominant stress pattern of the data. Additionally, our model retains high perfor- mance even without single-word utterances. We also demonstrate a discrepancy in the per- formance of our model and human infants on an artificial-language task in which stress cues and transition-probabil...