Paper: Big Data versus the Crowd: Looking for Relationships in All the Right Places

ACL ID P12-1087
Title Big Data versus the Crowd: Looking for Relationships in All the Right Places
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2012
Authors

Classically, training relation extractors relies on high-quality, manually annotated training data, which can be expensive to obtain. To mitigate this cost, NLU researchers have con- sidered two newly available sources of less expensive (but potentially lower quality) la- beled data from distant supervision and crowd sourcing. There is, however, no study com- paring the relative impact of these two sources on the precision and recall of post-learning an- swers. To fill this gap, we empirically study how state-of-the-art techniques are affected by scaling these two sources. We use corpus sizes of up to 100 million documents and tens of thousands of crowd-source labeled examples. Our experiments show that increasing the cor- pus size for distant supervision has a statis- tically significant,...