Paper: Tight Integration of Speech Disfluency Removal into SMT

ACL ID E14-4009
Title Tight Integration of Speech Disfluency Removal into SMT
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2014

Speech disfluencies are one of the main challenges of spoken language processing. Conventional disfluency detection systems deploy a hard decision, which can have a negative influence on subsequent appli- cations such as machine translation. In this paper we suggest a novel approach in which disfluency detection is integrated into the translation process. We train a CRF model to obtain a disflu- ency probability for each word. The SMT decoder will then skip the potentially dis- fluent word based on its disfluency prob- ability. Using the suggested scheme, the translation score of both the manual tran- script and ASR output is improved by around 0.35 BLEU points compared to the CRF hard decision system.