Paper: Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

ACL ID P11-2081
Title Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

Contrary to popular belief, we show that the optimal parameters for IBM Model 1 are not unique. We demonstrate that, for a large class of words, IBM Model 1 is indifferent among a continuum of ways to allocate prob- ability mass to their translations. We study the magnitude of the variance in optimal model parameters using a linear programming ap- proach as well as multiple random trials, and demonstrate that it results in variance in test set log-likelihood and alignment error rate.