Paper: Fast Cheap and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk

ACL ID D09-1030
Title Fast Cheap and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009
Authors

Manual evaluation of translation quality is generally thought to be excessively time consuming and expensive. We explore a fast and inexpensive way of doing it using Amazon’s Mechanical Turk to pay small sums to a large number of non-expert an- notators. For $10 we redundantly recre- ate judgments from a WMT08 transla- tion task. We find that when combined non-expert judgments have a high-level of agreement with the existing gold-standard judgments of machine translation quality, and correlate more strongly with expert judgments than Bleu does. We go on to show that Mechanical Turk can be used to calculate human-mediated translation edit rate (HTER), to conduct reading compre- hension experiments with machine trans- lation, and to create high quality reference translations.