Paper: The impact of language models and loss functions on repair disfluency detection

ACL ID P11-1071
Title The impact of language models and loss functions on repair disfluency detection
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

Unrehearsed spoken language often contains disfluencies. In order to correctly inter- pret a spoken utterance, any such disfluen- cies must be identified and removed or other- wise dealt with. Operating on transcripts of speech which contain disfluencies, we study the effect of language model and loss func- tion on the performance of a linear reranker that rescores the 25-best output of a noisy- channel model. We show that language mod- els trained on large amounts of non-speech data improve performance more than a lan- guage model trained on a more modest amount of speech data, and that optimising f-score rather than log loss improves disfluency detec- tion performance. Our approach uses a log-linear reranker, oper- ating on the top n analyses of a noisy chan- nel model. We use large lang...