Paper: A Statistical Language Modeling Approach to Lattice-Based Spoken Document Retrieval

ACL ID D07-1085
Title A Statistical Language Modeling Approach to Lattice-Based Spoken Document Retrieval
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007
Authors

Speech recognition transcripts are far from perfect; they are not of sufficient quality to be useful on their own for spoken document retrieval. This is especially the case for con- versational speech. Recent efforts have tried to overcome this issue by using statistics from speech lattices instead of only the 1- best transcripts; however, these efforts have invariably used the classical vector space re- trieval model. This paper presents a novel approach to lattice-based spoken document retrieval using statistical language models: a statistical model is estimated for each doc- ument, and probabilities derived from the document models are directly used to mea- sure relevance. Experimental results show that the lattice-based language modeling method outperforms both the language mod- eling ...