Paper: Bilingually Motivated Domain-Adapted Word Segmentation for Statistical Machine Translation

ACL ID E09-1063
Title Bilingually Motivated Domain-Adapted Word Segmentation for Statistical Machine Translation
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2009
Authors

We introduce a word segmentation ap- proach to languages where word bound- aries are not orthographically marked, with application to Phrase-Based Statis- tical Machine Translation (PB-SMT). In- stead of using manually segmented mono- lingual domain-specific corpora to train segmenters, we make use of bilingual cor- pora and statistical word alignment tech- niques. First of all, our approach is adapted for the specific translation task at hand by taking the corresponding source (target) language into account. Secondly, this approach does not rely on manu- ally segmented training data so that it can be automatically adapted for differ- ent domains. We evaluate the perfor- mance of our segmentation approach on PB-SMT tasks from two domains and demonstrate that our approach scores con- sisten...