Paper: Learning Sub-Word Units for Open Vocabulary Speech Recognition

ACL ID P11-1072
Title Learning Sub-Word Units for Open Vocabulary Speech Recognition
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

Large vocabulary speech recognition systems fail to recognize words beyond their vocab- ulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vo- cabulary word based systems; new words can then be represented by combinations of sub- word units. Previous work heuristically cre- ated the sub-word lexicon from phonetic rep- resentations of text using simple statistics to select common phone sequences. We pro- pose a probabilistic model to learn the sub- word lexicon optimized for a given task. We consider the task of out of vocabulary (OOV) word detection, which relies on output from a hybrid model. A hybrid model with our learned sub-word lexicon reduces error by 6.3% and 7.6% (...