Paper: Statistical Machine Translation in Low Resource Settings

ACL ID N13-2008
Title Statistical Machine Translation in Low Resource Settings
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Student Session
Year 2013

My thesis will explore ways to improve the performance of statistical machine translation (SMT) in low resource conditions. Specif- ically, it aims to reduce the dependence of modern SMT systems on expensive parallel data. We define low resource settings as hav- ing only small amounts of parallel data avail- able, which is the case for many language pairs. All current SMT models use parallel data during training for extracting translation rules and estimating translation probabilities. The theme of our approach is the integration of information from alternate data sources, other than parallel corpora, into the statisti- cal model. In particular, we focus on making use of large monolingual and comparable cor- pora. By augmenting components of the SMT framework, we hope to extend its applica...