Paper: Intersecting Multilingual Data for Faster and Better Statistical Translations

ACL ID N09-1015
Title Intersecting Multilingual Data for Faster and Better Statistical Translations
Venue Human Language Technologies
Session Main Conference
Year 2009
Authors
  • Yu Chen (Saarland University, Saarbrucken Germany; German Research Center for Artificial Intelligence, Saarbrucken Germany)
  • Martin Kay (Saarland University, Saarbrucken Germany; Stanford University, Stanford CA)
  • Andreas Eisele

In current phrase-based SMT systems, more training data is generally better than less. However, a larger data set eventually intro- duces a larger model that enlarges the search space for the translation problem, and con- sequently requires more time and more re- sources to translate. We argue redundant in- formation in a SMT system may not only de- lay the computations but also affect the qual- ity of the outputs. This paper proposes an ap- proach to reduce the model size by filtering out the less probable entries based on com- patible data in an intermediate language, a novel use of triangulation, without sacrificing the translation quality. Comprehensive exper- iments were conducted on standard data sets. We achieved significant quality improvements (up to 2.3 BLEU points) while transla...