Paper: A Projection Extension Algorithm For Statistical Machine Translation

ACL ID W03-1001
Title A Projection Extension Algorithm For Statistical Machine Translation
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2003
Authors

In this paper, we describe a phrase-based unigram model for statistical machine translation that uses a much simpler set of model parameters than similar phrase- based models. The units of translation are blocks – pairs of phrases. During decod- ing, we use a block unigram model and a word-based trigram language model. Dur- ing training, the blocks are learned from source interval projections using an un- derlying high-precision word alignment. The system performance is significantly increased by applying a novel block exten- sion algorithm using an additional high- recall word alignment. The blocks are fur- ther filtered using unigram-count selection criteria. The system has been successfully test on a Chinese-English and an Arabic- English translation task.