Paper: A Probabilistic Model For Text Categorization: Based On A Single Random Variable With Multiple Values

ACL ID A94-1027
Title A Probabilistic Model For Text Categorization: Based On A Single Random Variable With Multiple Values
Venue Applied Natural Language Processing Conference
Session Main Conference
Year 1994
Authors

Text categorization is the classification of documents with respect to a set of prede- fined categories. In this paper, we propose a new probabilistic model for text catego- rization, that is based on a Single random Variable with Multiple Values (SVMV). Com- pared to previous probabilistic models, our model has the following advantages; 1) it considers within-document term frequencies, 2) considers term weighting for target docu- ments, and 3) is less affected by having insuf- ficient training cases. We verify our model's superiority over the others in the task of cat- egorizing news articles from the "Wall Street Journal".