Paper: Cache-based Document-level Statistical Machine Translation

ACL ID D11-1084
Title Cache-based Document-level Statistical Machine Translation
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011

Statistical machine translation systems are usually trained on a large amount of bilingual sentence pairs and translate one sentence at a time, ignoring document-level information. In this paper, we propose a cache-based approach to document-level translation. Since caches mainly depend on relevant data to supervise subsequent decisions, it is critical to fill the caches with highly-relevant data of a reasonable size. In this paper, we present three kinds of caches to store relevant document-level infor- mation: 1) a dynamic cache, which stores bilin- gual phrase pairs from the best translation hypotheses of previous sentences in the test document; 2) a static cache, which stores rele- vant bilingual phrase pairs extracted from simi- lar bilingual document pairs (i.e. source ...