Paper: Power-Law Distributions for Paraphrases Extracted from Bilingual Corpora

ACL ID E12-1002
Title Power-Law Distributions for Paraphrases Extracted from Bilingual Corpora
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2012
Authors

We describe a novel method that extracts paraphrases from a bitext, for both the source and target languages. In order to reduce the search space, we decom- pose the phrase-table into sub-phrase-tables and construct separate clusters for source and target phrases. We convert the clus- ters into graphs, add smoothing/syntactic- information-carrier vertices, and compute the similarity between phrases with a ran- dom walk-based measure, the commute time. The resulting phrase-paraphrase probabilities are built upon the conversion of the commute times into artificial co- occurrence counts with a novel technique. The co-occurrence count distribution be- longs to the power-law family.