Paper: A Framework of Feature Selection Methods for Text Categorization

ACL ID P09-1078
Title A Framework of Feature Selection Methods for Text Categorization
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2009
Authors

In text categorization, feature selection (FS) is a strategy that aims at making text classifiers more efficient and accurate. However, when dealing with a new task, it is still difficult to quickly select a suitable one from various FS methods provided by many previous studies. In this paper, we propose a theoretic framework of FS methods based on two basic measurements: frequency measurement and ratio measurement. Then six popular FS methods are in detail discussed under this framework. Moreover, with the guidance of our theoretical analysis, we propose a novel method called weighed frequency and odds (WFO) that combines the two measurements with trained weights. The experimental results on data sets from both topic-based and sentiment classification tasks show that thi...