Paper: A New Feature Selection Score For Multinomial Naive Bayes Text Classification Based On KL-Divergence

ACL ID P04-3024
Title A New Feature Selection Score For Multinomial Naive Bayes Text Classification Based On KL-Divergence
Venue Annual Meeting of the Association of Computational Linguistics
Session System Demonstration
Year 2004
Authors

We define a new feature selection score for text classification based on the KL-divergence between the distribution of words in training documents and their classes. The score favors words that have a similar distribution in documents of the same class but different distributions in documents of different classes. Experiments on two standard data sets in- dicate that the new method outperforms mutual in- formation, especially for smaller categories.