Paper: Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions

ACL ID D07-1075
Title Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007
Authors

We present an information extraction system that decouples the tasks of finding relevant regions of text and applying extraction pat- terns. We create a self-trained relevant sen- tence classifier to identify relevant regions, and use a semantic affinity measure to au- tomatically learn domain-relevant extraction patterns. We then distinguish primary pat- terns from secondary patterns and apply the patterns selectively in the relevant regions. The resulting IE system achieves good per- formance on the MUC-4 terrorism corpus and ProMed disease outbreak stories. This approach requires only a few seed extraction patterns and a collection of relevant and ir- relevant documents for training.