ACL ID D07-1034
Title Extending a Thesaurus in the Pan-Chinese Context
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007

In this paper, we address a unique problem in Chinese language processing and report on our study on extending a Chinese the- saurus with region-specific words, mostly from the financial domain, from various Chinese speech communities. With the larger goal of automatically constructing a Pan-Chinese lexical resource, this work aims at taking an existing semantic classi- ficatory structure as leverage and incorpo- rating new words into it. In particular, it is important to see if the classification could accommodate new words from heterogene- ous data sources, and whether simple simi- larity measures and clustering methods could cope with such variation. We use the cosine function for similarity and test it on automatically classifying 120 target words from four regions, using different dat...