Paper: Modelling function words improves unsupervised word segmentation

ACL ID P14-1027
Title Modelling function words improves unsupervised word segmentation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

Inspired by experimental psychological findings suggesting that function words play a special role in word learning, we make a simple modification to an Adaptor Grammar based Bayesian word segmenta- tion model to allow it to learn sequences of monosyllabic ?function words? at the beginnings and endings of collocations of (possibly multi-syllabic) words. This modification improves unsupervised word segmentation on the standard Bernstein- Ratner (1987) corpus of child-directed En- glish by more than 4% token f-score com- pared to a model identical except that it does not special-case ?function words?, setting a new state-of-the-art of 92.4% to- ken f-score. Our function word model as- sumes that function words appear at the left periphery, and while this is true of languages such as English,...