Paper: Hitting the Right Paraphrases in Good Time

ACL ID N10-1017
Title Hitting the Right Paraphrases in Good Time
Venue Human Language Technologies
Session Main Conference
Year 2010

We present a random-walk-based approach to learning paraphrases from bilingual parallel corpora. The corpora are represented as a graph in which a node correspondsto a phrase, and an edge exists between two nodes if their corresponding phrases are aligned in a phrase table. We sample random walks to compute the average number of steps it takes to reach a ranking of paraphrases with better ones be- ing “closer” to a phrase of interest. This ap- proach allows “feature” nodes that represent domain knowledge to be built into the graph, and incorporates truncation techniques to pre- vent the graph from growing too large for ef- ficiency. Current approaches, by contrast, im- plicitly presuppose the graph to be bipartite, are limited to finding paraphrases that are of length two away from...