Paper: Improving Statistical Machine Translation with Word Class Models

ACL ID D13-1138
Title Improving Statistical Machine Translation with Word Class Models
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013
Authors

Automatically clustering words from a mono- lingual or bilingual training corpus into classes is a widely used technique in statisti- cal natural language processing. We present a very simple and easy to implement method for using these word classes to improve trans- lation quality. It can be applied across differ- ent machine translation paradigms and with arbitrary types of models. We show its ef- ficacy on a small German?English and a larger French?German translation task with both standard phrase-based and hierarchical phrase-based translation systems for a com- mon set of models. Our results show that with word class models, the baseline can be im- proved by up to 1.4% BLEU and 1.0% TER on the French?German task and 0.3% BLEU and 1.1% TER on the German?English task.