Paper: Chinese Novelty Mining

ACL ID D09-1162
Title Chinese Novelty Mining
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009

Automated mining of novel documents or sentences from chronologically ordered documents or sentences is an open chal- lenge in text mining. In this paper, we describe the preprocessing techniques for detecting novel Chinese text and discuss the influence of different Part of Speech (POS) filtering rules on the detection per- formance. Experimental results on AP- WSJ and TREC 2004 Novelty Track data show that the Chinese novelty mining per- formance is quite different when choosing two dissimilar POS filtering rules. Thus, the selection of words to represent Chinese text is of vital importance to the success of the Chinese novelty mining. Moreover, we compare the Chinese novelty mining per- formance with that of English and investi- gate the impact of preprocessing steps on detecting novel ...