Paper: Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

ACL ID P13-1083
Title Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2013
Authors

This paper proposes a nonparametric Bayesian method for inducing Part-of- Speech (POS) tags in dependency trees to improve the performance of statistical machine translation (SMT). In particular, we extend the monolingual infinite tree model (Finkel et al., 2007) to a bilin- gual scenario: each hidden state (POS tag) of a source-side dependency tree emits a source word together with its aligned tar- get word, either jointly (joint model), or independently (independent model). Eval- uations of Japanese-to-English translation on the NTCIR-9 data show that our in- duced Japanese POS tags for dependency trees improve the performance of a forest- to-string SMT system. Our independent model gains over 1 point in BLEU by re- solving the sparseness problem introduced in the joint model.