Paper: Automatic Text Categorization By Unsupervised Learning

ACL ID C00-1066
Title Automatic Text Categorization By Unsupervised Learning
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2000

The goal of text categorization is to classify documents into a certain number of pre- defined categories. The previous works in this area have used a large number of labeled training doculnents for supervised learning. One problem is that it is difficult to create the labeled training documents. While it is easy to collect the unlabeled documents, it is not so easy to manually categorize them for creating traiuing documents. In this paper, we propose an unsupervised learning method to overcome these difficulties. The proposed lnethod divides the documents into sentences, and categorizes each sentence using keyword lists of each category and sentence simihuity measure. And then, it uses the categorized sentences for refining. The proposed method shows a similar degree of performance, compa...