Paper: Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish

ACL ID P10-1047
Title Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2010
Authors

We present a novel scheme to apply fac- tored phrase-based SMT to a language pair with very disparate morphological struc- tures. Our approach relies on syntac- tic analysis on the source side (English) and then encodes a wide variety of local and non-local syntactic structures as com- plex structural tags which appear as ad- ditional factors in the training data. On the target side (Turkish), we only per- form morphological analysis and disam- biguation but treat the complete complex morphological tag as a factor, instead of separating morphemes. We incrementally explore capturing various syntactic sub- structures as complex tags on the En- glish side, and evaluate how our transla- tions improve in BLEU scores. Our max- imal set of source and target side trans- formations, coupled with so...