Paper: Overcoming the Lack of Parallel Data in Sentence Compression

ACL ID D13-1155
Title Overcoming the Lack of Parallel Data in Sentence Compression
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013
Authors

A major challenge in supervised sentence compression is making use of rich feature rep- resentations because of very scarce parallel data. We address this problem and present a method to automatically build a compres- sion corpus with hundreds of thousands of instances on which deletion-based algorithms can be trained. In our corpus, the syntactic trees of the compressions are subtrees of their uncompressed counterparts, and hence super- vised systems which require a structural align- ment between the input and output can be suc- cessfully trained. We also extend an exist- ing unsupervised compression method with a learning module. The new system uses struc- tured prediction to learn from lexical, syntac- tic and other features. An evaluation with hu- man raters shows that the presented da...