Paper: Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy

ACL ID D07-1078
Title Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007
Authors

We show that phrase structures in Penn Tree- bank style parses are not optimal for syntax- based machine translation. We exploit a se- ries of binarization methods to restructure the Penn Treebank style trees such that syn- tactified phrases smaller than Penn Treebank constituents can be acquired and exploited in translation. We find that by employing the EM algorithm for determining the binariza- tion of a parse tree among a set of alternative binarizations gives us the best translation re- sult.