Paper: Advances in the CMU/Interact Arabic GALE Transcription System

ACL ID N07-2033
Title Advances in the CMU/Interact Arabic GALE Transcription System
Venue Human Language Technologies
Session Short Paper
Year 2007
Authors

This paper describes the CMU/InterACT effort in developing an Arabic Automatic Speech Recognition (ASR) system for broadcast news and conversations within the GALE 2006 evaluation. Through the span of 9 month in preparation for this evaluation we improved our system by 40% relative compared to our legacy system. These improvements have been achieved by various steps, such as developing a vowelized system, combining this system with a non-vowelized one, harvesting transcripts of TV shows from the web for slightly supervised training of acoustic models, as well as language model adaptation, and finally fine-tuning the overall ASR system. Index Terms— Speech recognition, Vowelization, GALE, Arabic, Slightly supervised training, web data.