Paper: Identifying Relations for Open Information Extraction

ACL ID D11-1142
Title Identifying Relations for Open Information Extraction
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011

Open Information Extraction (IE) is the task of extracting assertions from massive corpora without requiring a pre-specified vocabulary. This paper shows that the output of state-of- the-art Open IE systems is rife with uninfor- mative and incoherent extractions. To over- come these problems, we introduce two sim- ple syntactic and lexical constraints on bi- nary relations expressed by verbs. We im- plemented the constraints in the REVERB Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TEXTRUNNER and WOEpos. More than 30% of REVERB’s extractions are at precision 0.8 or higher— compared to virtually none for earlier systems. The paper concludes with a detailed analysis of REVERB’s errors, suggesting directions f...