Paper: A Probabilistic Model for Canonicalizing Named Entity Mentions

ACL ID P12-1072
Title A Probabilistic Model for Canonicalizing Named Entity Mentions
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2012
Authors

We present a statistical model for canonicalizing named entity mentions into a table whose rows rep- resent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, first- order dependencies among attribute-parts, and a no- tion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglom- erative clustering approach and previous work.