Paper: Bilingually-constrained Phrase Embeddings for Machine Translation

ACL ID P14-1011
Title Bilingually-constrained Phrase Embeddings for Machine Translation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014

We propose Bilingually-constrained Re- cursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with differ- ent semantic meanings. The BRAE is trained in a way that minimizes the seman- tic distance of translation equivalents and maximizes the semantic distance of non- translation pairs simultaneously. After training, the model learns how to embed each phrase semantically in two languages and also learns how to transform semantic embedding space in one language to the other. We evaluate our proposed method on two end-to-end SMT tasks (phrase ta- ble pruning and decoding with phrasal se- mantic similarities) which need to mea- sure semantic similarity between a source phrase and its translation candidat...