Paper: Generalized Algorithms For Constructing Statistical Language Models

ACL ID P03-1006
Title Generalized Algorithms For Constructing Statistical Language Models
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2003
Authors

Recent text and speech processing applications such as speech mining raise new and more general problems re- lated to the construction of language models. We present and describe in detail several new and efficient algorithms to address these more general problems and report ex- perimental results demonstrating their usefulness. We give an algorithm for computing efficiently the expected counts of any sequence in a word lattice output by a speech recognizer or any arbitrary weighted automaton; describe a new technique for creating exact representa- tions of a2 -gram language models by weighted automata whose size is practical for offline use even for a vocab- ulary size of about 500,000 words and an a2 -gram order a2a4a3a6a5 ; and present a simple and more general technique for constructin...