Paper: Grounded Language Modeling for Automatic Speech Recognition of Sports Video

ACL ID P08-1015
Title Grounded Language Modeling for Automatic Speech Recognition of Sports Video
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008
Authors

Grounded language models represent the rela- tionship between words and the non-linguistic context in which they are said. This paper de- scribes how they are learned from large cor- pora of unlabeled video, and are applied to the task of automatic speech recognition of sports video. Results show that grounded language models improve perplexity and word error rate over text based language models, and fur- ther, support video information retrieval better than human generated speech transcriptions.