Paper: A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability

ACL ID D13-1005
Title A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013
Authors

We present a cognitive model of early lexi- cal acquisition which jointly performs word segmentation and learns an explicit model of phonetic variation. We define the model as a Bayesian noisy channel; we sample segmen- tations and word forms simultaneously from the posterior, using beam sampling to control the size of the search space. Compared to a pipelined approach in which segmentation is performed first, our model is qualitatively more similar to human learners. On data with vari- able pronunciations, the pipelined approach learns to treat syllables or morphemes as words. In contrast, our joint model, like infant learners, tends to learn multiword collocations. We also conduct analyses of the phonetic variations that the model learns to accept and its patterns of word recognition err...