Paper: Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation

ACL ID D11-1108
Title Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011
Authors

Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate re- source for extracting more sophisticated sen- tential paraphrases, which are more obviously learnable from monolingual parallel corpora. We extend bilingual paraphrase extraction to syntactic paraphrases and demonstrate its abil- ity to learn a variety of general paraphrastic transformations, including passivization, da- tive shift, and topicalization. We discuss how our model can be adapted to many text gener- ation tasks by augmenting its feature set, de- velopment data, and parameter estimation rou- tine. We illustrate this adaptation by using our paraphrase model for the task of sentence compression and achieve r...