Paper: Getting More Mileage From Web Text Sources For Conversational Speech Language Modeling Using Class-Dependent Mixtures

ACL ID N03-2003
Title Getting More Mileage From Web Text Sources For Conversational Speech Language Modeling Using Class-Dependent Mixtures
Venue Human Language Technologies
Session Short Paper
Year 2003
Authors

Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recog- nition task, but also that it is possible to get big- ger performance gains from the data by using class-dependent interpolation of N-grams.