Paper: Hierarchical Phrase-Based Translation with Suffix Arrays

ACL ID D07-1104
Title Hierarchical Phrase-Based Translation with Suffix Arrays
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007
  • Adam Lopez (University of Maryland, College Park MD)

A major engineering challenge in statistical machine translation systems is the efficient representation of extremely large translation rulesets. In phrase-based models, this prob- lem can be addressed by storing the training data in memory and using a suffix array as an efficient index to quickly lookup and ex- tract rules on the fly. Hierarchical phrase- based translation introduces the added wrin- kle of source phrases with gaps. Lookup algorithms used for contiguous phrases no longer apply and the best approximate pat- tern matching algorithms are much too slow, taking several minutes per sentence. We describe new lookup algorithms for hierar- chical phrase-based translation that reduce the empirical computation time by nearly two orders of magnitude, making on-the-fly lookupfeasiblefo...