Paper: Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora

ACL ID I05-1023
Title Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora
Venue International Joint Conference on Natural Language Processing
Session Main Conference
Year 2005
Authors

We present a new implication of Wu’s (1997) Inversion Transduction Grammar (ITG) Hypothesis, on the problem of retriev- ing truly parallel sentence translations from large collections of highly non-parallel documents. Our approach leverages a strong language uni- versal constraint posited by the ITG Hypothesis, that can serve as a strong inductive bias for various language learning problems, resulting in both efficiency and accuracy gains. The task we attack is highly prac- tical since non-parallel multilingual data exists in far greater quantities than parallel corpora, but parallel sentences are a much more useful re- source. Our aim here is to mine truly parallel sentences, as opposed to comparable sentence pairs or loose translations as in most previous work. The method we introduce...