Paper: A Graph-Based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields

ACL ID D07-1068
Title A Graph-Based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007
Authors

This paper presents a method for catego- rizing named entities in Wikipedia. In Wikipedia, an anchor text is glossed in a linked HTML text. We formalize named en- tity categorization as a task of categorizing anchor texts with linked HTML texts which glosses a named entity. Using this repre- sentation, we introduce a graph structure in which anchor texts are regarded as nodes. In order to incorporate HTML structure on the graph, three types of cliques are defined based on the HTML tree structure. We pro- pose a method with Conditional Random Fields (CRFs) to categorize the nodes on the graph. Since the defined graph may in- clude cycles, the exact inference of CRFs is computationally expensive. We introduce an approximate inference method using Tree- based Reparameterization (TRP) to reduc...