Paper: Dialect Classification for Online Podcasts Fusing Acoustic and Language Based Structural and Semantic Information

ACL ID P08-2006
Title Dialect Classification for Online Podcasts Fusing Acoustic and Language Based Structural and Semantic Information
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008
Authors

The variation in spech due to dialect is a factor which significantly impacts spech system per- formance. In this study, we investigate effective methods of combining acoustic and language in- formation to take advantage of (i) speaker based acoustic traits as wel as (ii) content based word selection acros the text sequence. For acoustics, a GM based system is employed and for text based dialect clasification, we proposed n-gram language models combined with Latent Seman- tic Analysis (LSA) based dialect clasifiers. The performance of the individual clasifiers is es- tablished for the three dialect family case (DC rates vary from 69.1%-72.4%). The final com- bined system achieved a DC acuracy of 79.5% and significantly outperforms the baseline acoustic clasifier with a relative...