Paper: Jumping Distance based Chinese Person Name Disambiguation

ACL ID W10-4158
Venue Joint Conference on Chinese Language Processing
Session Main Conference
Year 2010

In this paper, we describe a Chinese person name disambiguation system for news articles and report the results obtained on the data set of the CLP 2010 Bakeoff-3 1 . The main task of the Bakeoff is to identify different persons from the news stories that contain the same person-name string. Compared to the traditional methods, two additional features are used in our system: 1) n-grams co-occurred with target name string; 2) Jumping distance among the n-grams. On the basis, we propose a two-stage clustering algo- rithm to improve the low recall. 1 Our Novel Try For this task, we propose a Jumping-Distance based n-gram model (abbr. DJ n-gram) to de- scribe the semantics of the closest contexts of the target person-name strings. The generation of the DJ n-gram model mainly in...