Paper: Using RBMT Systems to Produce Bilingual Corpus for SMT

ACL ID D07-1030
Title Using RBMT Systems to Produce Bilingual Corpus for SMT
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007
Authors

This paper proposes a method using the ex- isting Rule-based Machine Translation (RBMT) system as a black box to produce synthetic bilingual corpus, which will be used as training data for the Statistical Ma- chine Translation (SMT) system. We use the existing RBMT system to translate the monolingual corpus into synthetic bilingual corpus. With the synthetic bilingual corpus, we can build an SMT system even if there is no real bilingual corpus. In our experi- ments using BLEU as a metric, the system achieves a relative improvement of 11.7% over the best RBMT system that is used to produce the synthetic bilingual corpora. We also interpolate the model trained on a real bilingual corpus and the models trained on the synthetic bilingual corpora. The interpolated model achieves an abso- lute i...