Paper: Open Information Extraction Using Wikipedia

ACL ID P10-1013
Title Open Information Extraction Using Wikipedia
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2010

Information-extraction (IE) systems seek to distill semantic relations from natural- language text, but most systems use super- vised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE system which improves dramatically on TextRunner’s precision and recall. The key to WOE’s per- formance is a novel form of self-supervised learning for open extractors — using heuris- tic matches between Wikipedia infobox at- tribute values and corresponding sentences to construct training data. Like TextRunner, WOE’s extractor eschews lexicalized features...