Paper: Extracting Parallel Fragments from Comparable Corpora for Data-to-text Generation

ACL ID W10-4217
Title Extracting Parallel Fragments from Comparable Corpora for Data-to-text Generation
Venue International Conference on Natural Language Generation
Session Main Conference
Year 2010
Authors

Building NLG systems, in particular sta- tistical ones, requires parallel data (paired inputs and outputs) which do not gener- ally occur naturally. In this paper, we in- vestigate the idea of automatically extract- ing parallel resources for data-to-text gen- eration from comparable corpora obtained from the Web. We describe our compa- rable corpus of data and texts relating to British hills and the techniques for extract- ing paired input/output fragments we have developed so far.