Paper: Multi-Field Information Extraction And Cross-Document Fusion

ACL ID P05-1060
Title Multi-Field Information Extraction And Cross-Document Fusion
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2005
Authors

In this paper, we examine the task of extracting a set of biographic facts about target individuals from a collection of Web pages. We automatically anno- tate training text with positive and negative exam- ples of fact extractions and train Rote, Na¨ıve Bayes, and Conditional Random Field extraction models for fact extraction from individual Web pages. We then propose and evaluate methods for fusing the extracted information across documents to return a consensus answer. A novel cross-field bootstrapping method leverages data interdependencies to yield improved performance.