Paper: Extracting Sequences from the Web

ACL ID P10-2053
Title Extracting Sequences from the Web
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2010

Classical Information Extraction (IE) sys- tems fill slots in domain-specific frames. This paper reports on SEQ, a novel open IE system that leverages a domain- independent frame to extract ordered se- quences such as presidents of the United States or the most common causes of death in the U.S. SEQ leverages regularities about sequences to extract a coherent set of sequences from Web text. SEQ nearly doubles the area under the precision-recall curve compared to an extractor that does not exploit these regularities.