Paper: Forest-based Translation Rule Extraction

ACL ID D08-1022
Title Forest-based Translation Rule Extraction
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2008
  • Haitao Mi (Chinese Academy of Sciences, Beijing China)
  • Liang Huang (University of Pennsylvania, Philadelphia PA; Chinese Academy of Sciences, Beijing China)

Translation rule extraction is a fundamental problem in machine translation, especially for linguistically syntax-based systems that need parse trees from either or both sides of the bi- text. The current dominant practice only uses 1-best trees, which adversely affects the rule set quality due to parsing errors. So we pro- pose a novel approach which extracts rules from a packed forest that compactly encodes exponentially many parses. Experiments show that this method improves translation quality by over 1 BLEU point on a state-of-the-art tree-to-string system, and is 0.5 points better than (and twice as fast as) extracting on 30- best parses. When combined with our previous work on forest-based decoding, it achieves a 2.5 BLEU points improvement over the base- line, and even outperforms ...