Paper: Bayesian Word Alignment for Massively Parallel Texts

ACL ID E14-4024
Title Bayesian Word Alignment for Massively Parallel Texts
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

There has been a great amount of work done in the field of bitext alignment, but the problem of aligning words in mas- sively parallel texts with hundreds or thou- sands of languages is largely unexplored. While the basic task is similar, there are also important differences in purpose, method and evaluation between the prob- lems. In this work, I present a non- parametric Bayesian model that can be used for simultaneous word alignment in massively parallel corpora. This method is evaluated on a corpus containing 1144 translations of the New Testament.