Paper: Solving the “Who’s Mark Johnson Puzzle”: Information Extraction Based Cross Document Coreference

ACL ID N09-3002
Title Solving the “Who’s Mark Johnson Puzzle”: Information Extraction Based Cross Document Coreference
Venue HLT-NAACL Companion Volume: Student Research Workshop and Doctoral Consortium
Session
Year 2009
Authors

Cross Document Coreference (CDC) is the problem of resolving the underlying identity of entities across multiple documents and is a major step for document understanding. We develop a framework to efficiently determine the identity of a person based on extracted information, which includes unary properties such as gender and title, as well as binary relationships with other named entities such as co-occurrence and geo-locations. At the heart of our approach is a suite of similarity functions (specialists) for matching relationships and a relational density-based clustering algorithm that delineates name clusters based on pairwise similarity. We demonstrate the effectiveness of our methods on the WePS benchmark datasets and point out future research directions.