Paper: Combine Person Name and Person Identity Recognition and Document Clustering for Chinese Person Name Disambiguation

ACL ID W10-4154
Title Combine Person Name and Person Identity Recognition and Document Clustering for Chinese Person Name Disambiguation
Venue Joint Conference on Chinese Language Processing
Session Main Conference
Year 2010
Authors

This paper presents the HITSZ_CITYU system in the CIPS-SIGHAN bakeoff 2010 Task 3, Chinese person name dis- ambiguation. This system incorporates person name string recognition, person identity string recognition and an agglo- merative hierarchical clustering for grouping the documents to each identical person. Firstly, for the given name index string, three segmentors are applied to segment the sentences having the index string into Chinese words, respectively. Their outputs are compared and analyzed. An unsupervised clustering is applied here to help the personal name recogni- tion. The document set is then divided into subsets according to each recog- nized person name string. Next, the sys- tem identifies/extracts the person identity string from the sentences based on le...