Paper: Unsupervised phonemic Chinese word segmentation using Adaptor Grammars

ACL ID C10-1060
Title Unsupervised phonemic Chinese word segmentation using Adaptor Grammars
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2010
Authors

Adaptor grammars are a framework for expressing and performing inference over a variety of non-parametric linguistic models. These models currently provide state-of-the-art performance on unsuper- vised word segmentation from phonemic representations of child-directed unseg- mented English utterances. This paper in- vestigates the applicability of these mod- els to unsupervised word segmentation of Mandarin. We investigate a wide vari- ety of different segmentation models, and show that the best segmentation accuracy isobtainedfrommodelsthatcaptureinter- word “collocational” dependencies. Sur- prisingly, enhancing the models to exploit syllable structure regularities and to cap- ture tone information does improve over- all word segmentation accuracy, perhaps because the information the...