Paper: Formatting Time-Aligned ASR Transcripts for Readability

ACL ID N10-1023
Title Formatting Time-Aligned ASR Transcripts for Readability
Venue Human Language Technologies
Session Main Conference
Year 2010
Authors

We address the problem of formatting the out- put of an automatic speech recognition (ASR) system for readability, while preserving word- level timing information of the transcript. Our system enriches the ASR transcript with punc- tuation, capitalization and properly written dates, times and other numeric entities, and our approach can be applied to other format- ting tasks. The method we describe combines hand-crafted grammars with a class-based lan- guage model trained on written text and relies on Weighted Finite State Transducers (WF- STs) for the preservation of start and end time of each word.