Paper: Clustering and Matching Headlines for Automatic Paraphrase Acquisition

ACL ID W09-0621
Title Clustering and Matching Headlines for Automatic Paraphrase Acquisition
Venue European Workshop on Natural Language Generation
Session
Year 2009
Authors

For developing a data-driven text rewriting algorithm for paraphrasing, it is essential to have a monolingual corpus of aligned paraphrased sentences. News article head- lines are a rich source of paraphrases; they tend to describe the same event in vari- ous different ways, and can easily be ob- tained from the web. We compare two methods of aligning headlines to construct such an aligned corpus of paraphrases, one based on clustering, and the other on pair- wise similarity-based matching. We show that the latter performs best on the task of aligning paraphrastic headlines.