Paper: Improving Information Extraction By Modeling Errors In Speech Recognizer Output

ACL ID H01-1034
Title Improving Information Extraction By Modeling Errors In Speech Recognizer Output
Venue Human Language Technologies
Session Main Conference
Year 2001
Authors

In this paper we describe a technique for improving the performance of an information extraction system for speech data by explicitly modeling the errors in the recognizer output. The approach combines a statistical model of named entity states with a lattice representation of hypothesized words and errors annotated with recognition confidence scores. Additional refinements include the use of multiple error types, improved confidence estimation, and multi- pass processing. In combination, these techniques im- prove named entity recognition performance over a text- based baseline by 28%. Keywords ASR error modeling, information extraction, word confi- dence 1.