Paper: Two Ways to Use a Noisy Parallel News Corpus for Improving Statistical Machine Translation

ACL ID W11-1207
Title Two Ways to Use a Noisy Parallel News Corpus for Improving Statistical Machine Translation
Venue Building and Using Comparable Corpora
Session
Year 2011
Authors

In this paper, we present two methods to use a noisy parallel news corpus to improve sta- tistical machine translation (SMT) systems. Taking full advantage of the characteristics of our corpus and of existing resources, we use a bootstrapping strategy, whereby an existing SMT engine is used both to detect parallel sen- tences in comparable data and to provide an adaptation corpus for translation models. MT experiments demonstrate the benefits of vari- ous combinations of these strategies.