Paper: Text Classification based on the Latent Topics of Important Sentences extracted by the PageRank Algorithm

ACL ID P13-3007
Title Text Classification based on the Latent Topics of Important Sentences extracted by the PageRank Algorithm
Venue Annual Meeting of the Association of Computational Linguistics
Session Student Session
Year 2013
Authors

In this paper, we propose a method to raise the accuracy of text classification based on latent topics, reconsidering the techniques necessary for good classification ? for example, to de- cide important sentences in a document, the sentences with important words are usually re- garded as important sentences. In this case, tf.idf is often used to decide important words. On the other hand, we apply the PageRank al- gorithm to rank important words in each doc- ument. Furthermore, before clustering docu- ments, we refine the target documents by rep- resenting them as a collection of important sentences in each document. We then clas- sify the documents based on latent informa- tion in the documents. As a clustering method, we employ the k-means algorithm and inves- tigate how our proposed met...