Paper: Improved Unsupervised Sentence Alignment for Symmetrical and Asymmetrical Parallel Corpora

ACL ID C10-2010
Title Improved Unsupervised Sentence Alignment for Symmetrical and Asymmetrical Parallel Corpora
Venue International Conference on Computational Linguistics
Session Poster Session
Year 2010
Authors

We address the problem of unsupervised and language-pair independent alignment of symmetrical and asymmetrical parallel corpora. Asymmetrical parallel corpora contain a large proportion of 1-to-0/0-to-1 and 1-to-many/many-to-1 sentence corre- spondences. We have developed a novel approach which is fast and allows us to achieve high accuracy in terms of F1 for the alignment of both asymmetrical and symmetrical parallel corpora. The source code of our aligner and the test sets are freely available.