Paper: Unsupervised Word Segmentation in Context

ACL ID C14-1219
Title Unsupervised Word Segmentation in Context
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014

This paper extends existing word segmentation models to take non-linguistic context into ac- count. It improves the token F-score of a top performing segmentation models by 2.5% on a 27k utterances dataset. We posit that word segmentation is easier in-context because the learner is not trying to access irrelevant lexical items. We use topics from a Latent Dirichlet Allocation model as a proxy for ?activities? contexts, to label the Providence corpus. We present Adaptor Grammar models that use these context labels, and we study their performance with and without context annotations at test time.