Paper: Classifying Chinese Texts in Two Steps

ACL ID I05-1027
Title Classifying Chinese Texts in Two Steps
Venue International Joint Conference on Natural Language Processing
Session Main Conference
Year 2005
Authors
  • Xinghua Fan (Tsinghua University, Beijing China; State Intellectual Property Office of P.R., Beijing China; Korea Advanced Institute of Science and Technology, Korea)
  • Maosong Sun (Tsinghua University, Beijing China)
  • Key-Sun Choi (Korea Advanced Institute of Science and Technology, Korea)
  • Qin Zhang (State Intellectual Property Office of P.R., Beijing China)

This paper proposes a two-step method for Chinese text categoriza- tion (TC). In the first step, a Naïve Bayesian classifier is used to fix the fuzzy area between two categories, and, in the second step, the classifier with more subtle and powerful features is used to deal with documents in the fuzzy area, which are thought of being unreliable in the first step. The preliminary experi- ment validated the soundness of this method. Then, the method is extended from two-class TC to multi-class TC. In this two-step framework, we try to fur- ther improve the classifier by taking the dependences among features into con- sideration in the second step, resulting in a Causality Naïve Bayesian Classifier.