Paper: Semi-Supervised Discriminative Language Modeling with Out-of-Domain Text Data

ACL ID N13-1087
Title Semi-Supervised Discriminative Language Modeling with Out-of-Domain Text Data
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013
Authors

One way to improve the accuracy of auto- matic speech recognition (ASR) is to use dis- criminative language modeling (DLM), which enhances discrimination by learning where the ASR hypotheses deviate from the uttered sentences. However, DLM requires large amounts of ASR output to train. Instead, we can simulate the output of an ASR sys- tem, in which case the training becomes semi- supervised. The advantage of using simu- lated hypotheses is that we can generate as many hypotheses as we want provided that we have enough text material. In typical scenar- ios, transcribed in-domain data is limited but large amounts of out-of-domain (OOD) data is available. In this study, we investigate how semi-supervised training performs with OOD data. We find out that OOD data can yield im- provements comp...