Paper: Automatic Building and Using Parallel Resources for SMT from Comparable Corpora

ACL ID W14-1009
Title Automatic Building and Using Parallel Resources for SMT from Comparable Corpora
Venue Workshop on Hybrid Approaches to Translation
Session
Year 2014
Authors

Building parallel resources for corpus based machine translation, especially Statistical Machine Translation (SMT), from comparable corpora has recently received wide attention in the field Machine Translation research. In this paper, we propose an automatic approach for extraction of parallel fragments from comparable corpora. The comparable corpora are collected from Wikipedia documents and this approach exploits the multilingualism of Wikipedia. The automatic alignment process of parallel text fragments uses a textual entailment technique and Phrase Based SMT (PB- SMT) system. The parallel text fragments extracted thus are used as additional parallel translation examples to complement the training data for a PB- SMT system. The additional training data extracted from ...