Paper: Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization

ACL ID C10-1055
Title Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2010
Authors

Cross Document Coreference (CDC) is the task of constructing the coreference chain for mentions of a person across a set of documents. This work offers a holistic view of using document-level categories, sub-document level context and extracted entities and relations for the CDC task. We train a categorization component with an efficient flat algorithm using thousands of ODP categories and over a million web documents. We propose to use ranked cat- egories as coreference information, partic- ularly suitable for web documents that are widely different in style and content. An ensemble composite coreference function, amenable to inactive features, combines these three levels of evidence for disam- biguation. A thorough feature importance study is conducted to analyze how these three componen...