Paper: Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

ACL ID P13-4033
Title Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
Venue Annual Meeting of the Association of Computational Linguistics
Session System Demonstration
Year 2013
Authors

We describe Docent, an open-source de- coder for statistical machine translation that breaks with the usual sentence-by- sentence paradigm and translates complete documents as units. By taking transla- tion to the document level, our decoder can handle feature models with arbitrary discourse-wide dependencies and consti- tutes an essential infrastructure compon- ent in the quest for discourse-aware SMT models. 1 Motivation Most of the research on statistical machine trans- lation (SMT) that was conducted during the last 20 years treated every text as a ?bag of sentences? and disregarded all relations between elements in different sentences. Systematic research into ex- plicitly discourse-related problems has only begun very recently in the SMT community (Hardmeier, 2012) with work on topic...