Paper: Robust Entity Clustering via Phylogenetic Inference

ACL ID P14-1073
Title Robust Entity Clustering via Phylogenetic Inference
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014

Entity clustering must determine when two named-entity mentions refer to the same entity. Typical approaches use a pipeline ar- chitecture that clusters the mentions using fixed or learned measures of name and con- text similarity. In this paper, we propose a model for cross-document coreference res- olution that achieves robustness by learn- ing similarity from unlabeled data. The generative process assumes that each entity mention arises from copying and option- ally mutating an earlier name from a sim- ilar context. Clustering the mentions into entities depends on recovering this copying tree jointly with estimating models of the mutation process and parent selection pro- cess. We present a block Gibbs sampler for posterior inference and an empirical evalu- ation on several datasets.