Paper: Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora

ACL ID D11-1062
Title Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011
Authors

We address the creation of cross-lingual tex- tual entailment corpora by means of crowd- sourcing. Our goal is to define a cheap and replicable data collection methodology that minimizes the manual work done by expert annotators, without resorting to preprocess- ing tools or already annotated monolingual datasets. In line with recent works empha- sizing the need of large-scale annotation ef- forts for textual entailment, our work aims to: i) tackle the scarcity of data available to train and evaluate systems, and ii) promote the re- course to crowdsourcing as an effective way to reduce the costs of data collection without sacrificing quality. We show that a complex data creation task, for which even experts usu- ally feature low agreement scores, can be ef- fectively decomposed into simple...