ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | N10-1023 |
---|---|
Title | Formatting Time-Aligned ASR Transcripts for Readability |
Venue | Human Language Technologies |
Session | Main Conference |
Year | 2010 |
Authors |
We address the problem of formatting the out- put of an automatic speech recognition (ASR) system for readability, while preserving word- level timing information of the transcript. Our system enriches the ASR transcript with punc- tuation, capitalization and properly written dates, times and other numeric entities, and our approach can be applied to other format- ting tasks. The method we describe combines hand-crafted grammars with a class-based lan- guage model trained on written text and relies on Weighted Finite State Transducers (WF- STs) for the preservation of start and end time of each word.