Paper: Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability

ACL ID P11-2031
Title Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

In statistical machine translation, a researcher seeks to determine whether some innovation (e.g., a new feature, model, or inference al- gorithm) improves translation quality in com- parison to a baseline system. To answer this question, he runs an experiment to evaluate the behavior of the two systems on held-out data. In this paper, we consider how to make such experiments more statistically reliable. We provide a systematic analysis of the effects of optimizer instability—an extraneous variable that is seldom controlled for—on experimen- tal outcomes, and make recommendations for reporting results more accurately.