Paper: Aligning Sentences In Bilingual Corpora Using Lexical Information

ACL ID P93-1002
Title Aligning Sentences In Bilingual Corpora Using Lexical Information
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1993
Authors

In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ig- nore word identities and only consider sentence length (Brown el al. , 1991b; Gale and Church, 1991). Our algorithm constructs a simple statisti- cal word-to-word translation model on the fly dur- ing alignment. We find the alignment that maxi- mizes the probability of generating the corpus with this translation model. We have achieved an error rate of approximately 0.4% on Canadian Hansard data, which is a significant improvement over pre- vious results. The algorithm is language indepen- dent.