Paper: Phrase Clustering for Smoothing TM Probabilities - or How to Extract Paraphrases from Phrase Tables

ACL ID C10-1069
Title Phrase Clustering for Smoothing TM Probabilities - or How to Extract Paraphrases from Phrase Tables
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2010
Authors

This paper describes how to cluster to- gether the phrases of a phrase-based sta- tistical machine translation (SMT) sys- tem, using information in the phrase table itself. The clustering is symmetric and recursive: it is applied both to source- language and target-language phrases, and the clustering in one language helps determine the clustering in the other. The phrase clusters have many possible uses. This paper looks at one of these uses: smoothing the conditional translation model (TM) probabilities employed by the SMT system. We incorporated phrase-cluster-derived probability esti- mates into a baseline loglinear feature combination that included relative fre- quency and lexically-weighted condition- al probability estimates. In Chinese- English (C-E) and French-English ...