Paper: Deciphering Foreign Language by Combining Language Models and Context Vectors

ACL ID P12-1017
Title Deciphering Foreign Language by Combining Language Models and Context Vectors
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2012
Authors

In this paper we show how to train statis- tical machine translation systems on real- life tasks using only non-parallel monolingual data from two languages. We present a mod- ification of the method shown in (Ravi and Knight, 2011) that is scalable to vocabulary sizes of several thousand words. On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computa- tional effort when running our method with an n-gram language model. The efficiency improvement of our method allows us to run experiments with vocabulary sizes of around 5,000 words, such as a non-parallel version of the VERBMOBIL corpus. We also report re- sults using data from the monolingual French and English GIGAWORD corpora.