Paper: Virtual Evidence for Training Speech Recognizers Using Partially Labeled Data

ACL ID N07-2042
Title Virtual Evidence for Training Speech Recognizers Using Partially Labeled Data
Venue Human Language Technologies
Session Short Paper
Year 2007
Authors

Collecting supervised training data for au- tomatic speech recognition (ASR) sys- tems is both time consuming and expen- sive. In this paper we use the notion of vir- tual evidence in a graphical-model based system to reduce the amount of supervi- sory training data required for sequence learning tasks. We apply this approach to a TIMIT phone recognition system, and show that our VE-based training scheme can, relative to a baseline trained with the full segmentation, yield similar results with only 15.3% of the frames labeled (keeping the number of utterances xed).