Paper: Enhancing Statistical Machine Translation with Character Alignment

ACL ID P12-2056
Title Enhancing Statistical Machine Translation with Character Alignment
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2012
Authors

The dominant practice of statistical machine translation (SMT) uses the same Chinese word segmentation specification in both alignment and translation rule induction steps in building Chinese-English SMT system, which may suf- fer from a suboptimal problem that word seg- mentation better for alignment is not necessarily better for translation. To tackle this, we propose a framework that uses two different segmenta- tion specifications for alignment and translation respectively: we use Chinese character as the basic unit for alignment, and then convert this alignment to conventional word alignment for translation rule induction. Experimentally, our approach outperformed two baselines: fully word-based system (using word for both alignment and translation) and fully charac- ter-...