Paper: Extending a Thesaurus with Words from Pan-Chinese Sources

ACL ID C08-1058
Title Extending a Thesaurus with Words from Pan-Chinese Sources
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2008

In this paper, we work on extending a Chinese thesaurus with words distinctly used in various Chinese communities. The acquisition and classification of such region-specific lexical items is an impor- tant step toward the larger goal of con- structing a Pan-Chinese lexical resource. In particular, we extend a previous study in three respects: (1) to improve auto- matic classification by removing dupli- cated words from the thesaurus, (2) to experiment with classifying words at the subclass level and semantic head level, and (3) to further investigate the possible effects of data heterogeneity between the region-specific words and words in the thesaurus on classification performance. Automatic classification was based on the similarity between a target word and individual...