Paper: Monolingual Distributional Profiles for Word Substitution in Machine Translation

ACL ID C10-2037
Title Monolingual Distributional Profiles for Word Substitution in Machine Translation
Venue International Conference on Computational Linguistics
Session Poster Session
Year 2010
Authors

Out-of-vocabulary (OOV) words present a significant challenge for Machine Trans- lation. For low-resource languages, lim- ited training data increases the frequency of OOV words and this degrades the qual- ity of the translations. Past approaches have suggested using stems or synonyms for OOV words. Unlike the previous methods, we show how to handle not just the OOV words but rare words as well in an Example-based Machine Transla- tion (EBMT) paradigm. Presence of OOV words and rare words in the input sentence prevents the system from finding longer phrasal matches and produces low qual- ity translations due to less reliable lan- guage model estimates. The proposed method requires only a monolingual cor- pus of the source language to find can- didate replacements. A new framework is introd...