ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | W09-0621 |
---|---|
Title | Clustering and Matching Headlines for Automatic Paraphrase Acquisition |
Venue | European Workshop on Natural Language Generation |
Session | |
Year | 2009 |
Authors |
|
For developing a data-driven text rewriting algorithm for paraphrasing, it is essential to have a monolingual corpus of aligned paraphrased sentences. News article head- lines are a rich source of paraphrases; they tend to describe the same event in vari- ous different ways, and can easily be ob- tained from the web. We compare two methods of aligning headlines to construct such an aligned corpus of paraphrases, one based on clustering, and the other on pair- wise similarity-based matching. We show that the latter performs best on the task of aligning paraphrastic headlines.