Paper: Negative Training Data Can be Harmful to Text Classification

ACL ID D10-1022
Title Negative Training Data Can be Harmful to Text Classification
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010
Authors

This paper studies the effects of training data on binary text classification and postulates that negative training data is not needed and may even be harmful for the task. Traditional binary classification involves building a clas- sifier using labeled positive and negative training examples. The classifier is then ap- plied to classify test instances into positive and negative classes. A fundamental assump- tion is that the training and test data are iden- tically distributed. However, this assumption may not hold in practice. In this paper, we study a particular problem where the positive data is identically distributed but the negative data may or may not be so. Many practical text classification and retrieval applications fit this model. We argue that in this setting nega...