Paper: Syntactic Constraints on Paraphrases Extracted from Parallel Corpora

ACL ID D08-1021
Title Syntactic Constraints on Paraphrases Extracted from Parallel Corpora
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2008
Authors

We improve the quality of paraphrases ex- tracted from parallel corpora by requiring that phrases and their paraphrases be the same syn- tactic type. This is achieved by parsing the En- glish side of a parallel corpus and altering the phrase extraction algorithm to extract phrase labels alongside bilingual phrase pairs. In or- der to retain broad coverage of non-constituent phrases, complex syntactic labels are intro- duced. A manual evaluation indicates a 19% absolute improvement in paraphrase quality over the baseline method.