Paper: Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search

ACL ID N13-1025
Title Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013
Authors

We introduce a new large-scale discrimina- tive learning algorithm for machine translation that is capable of learning parameters in mod- els with extremely sparse features. To ensure their reliable estimation and to prevent over- fitting, we use a two-phase learning algorithm. First, the contribution of individual sparse fea- tures is estimated using large amounts of par- allel data. Second, a small development cor- pus is used to determine the relative contri- butions of the sparse features and standard dense features. Not only does this two-phase learning approach prevent overfitting, the sec- ond pass optimizes corpus-level BLEU of the Viterbi translation of the decoder. We demon- strate significant improvements using sparse rule indicator features in three different trans- lation task...