Paper: Scaling Phrase-Based Statistical Machine Translation To Larger Corpora And Longer Phrases

ACL ID P05-1032
Title Scaling Phrase-Based Statistical Machine Translation To Larger Corpora And Longer Phrases
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2005
Authors

In this paper we describe a novel data structure for phrase-based statistical ma- chine translation which allows for the re- trieval of arbitrarily long phrases while si- multaneously using less memory than is required by current decoder implementa- tions. We detail the computational com- plexity and average retrieval times for looking up phrase translations in our suf- fix array-based data structure. We show how sampling can be used to reduce the retrieval time by orders of magnitude with no loss in translation quality.