Paper: Combining Unsupervised and Supervised Alignments for MT: An Empirical Study

ACL ID D10-1065
Title Combining Unsupervised and Supervised Alignments for MT: An Empirical Study
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010
Authors

Word alignment plays a central role in statisti- cal MT (SMT) since almost all SMT systems extract translation rules from word aligned parallel training data. While most SMT systems use unsupervised algorithms (e.g. GIZA++) for training word alignment, super- vised methods, which exploit a small amount of human-aligned data, have become increas- ingly popular recently. This work empirically studies the performance of these two classes of alignment algorithms and explores strate- gies to combine them to improve overall sys- tem performance. We used two unsupervised aligners, GIZA++ and HMM, and one super- vised aligner, ITG, in this study. To avoid lan- guage and genre specific conclusions, we ran experiments on test sets consisting of two lan- guage pairs (Chinese-to-English and Arabic- to...